Gerald & Wheatley, p. 45
Again, understanding your function can help avoid this
sort of problem.
The false position method, because it sometimes keeps "older" estimates,
does not converge as quickly as the secant method. However, because it
is strict about keeping the root bracketed, it is less vulverable to the
secant method's pathology. Whether the tradeoff is worth the slower convergence
depends on several things: How badly behaved the function is, how critical
rapid convergence is, and how much overhead is involved in doing a function
evaluation. If the function evaluation is especially complex, we may want
to make as few as possible. On the other hand, if evaluating the function
is simple, we may prefer the security of false position, as a few extra function
evaluations (iterations of the algorithm) is a small price to pay.
Newton's Method
Newton's method requires additional information about the function, namely
the first derivative. Sometimes this is easy to obtain, sometimes not. Also,
Newton's method should be used with caution because it is vulnerable to certain
pathologies, mostly a local extremum (minimum or maximum) located near the
root. Also, although Newton converges very rapidly near the root,
its global convergence is very poor. Thus, one should use some other method
(understanding of the function, inspection of the function's graph, etc)
to at least get into the neighborhood of the root.
Newton's method works by using the first derivative (instantaneous slope)
to estimate where the root lies. The tangent line is extrapolated to intersect
the x axis, which is taken as the next estimate:
Press et al., p. 288. Full reference below.
Convergence near root is
very rapid: Quadratic convergence, meaning
number of significant figures
doubles each step.
The formula used is quite simple:
We evaluate the function and its first derivative at our initial estimate,
then use that information to generate the next estimate, repeating until
our function evaluation is "small enough" or two successive estimates are
"close enough" that we declare ourselves satisfied.
So why not use Newton all the time? We may not have an efficient evaluation
of f'(x) available, or any evaluation of it available at all. Also, a local
extremum (minimum or maximum) near the root can be the death of the method,
because we divide by the value of the derivative, which is 0 at the extremum:
Press et al., p. 288.
If we land on the extremum we divide by zero and our program crashes;
if we land too close to the extremum, our next estimate sends us off on
a wild-goose chase with little hope of finding our way back. Thus, most
practical implementations do some sort of check on the magnitude of the
derivative before carrying out the division. If the value of the derivative
is too small, a safer method such as false position, or even simple bisection,
is used instead. (This again requires some extra bookkeeping, as Newton doesn't
require the root to be bracketed and needs only one estimate at a time.)
Other Methods
Mueller's method is often used as a subsitute for one
of the interpolation methods above. It uses three estimates and interpolates
a second-degree equation (a parabola) to approximate the function near the
root. Because the interpolated function is nonlinear, it converges much more
rapidly than the linear methods mentioned here, and the computational cost,
while higher, is still small enough to be worth it. Other methods such as
spline interpolation are sometimes used for root finding; the basic
idea (fitting a polynomial and using the root of the polynomial as the next
estimate) remains the same.
Multidimensional Root Finding
Finding roots in multiple dimensions can be much more
difficult. For example, if z = f(x, y), the roots will be coutours in the
xy plane where z = 0. Mapping these out can be difficult, and finding them
numerically even more so. However, if we know in advance where a root is likely
to lie, the situation becomes more tractable. If there are several independent
variables and only one dependent variable, the methods we've discussed so
far generalize to multidimensions fairly easily. For instance, instead of
the first derviative in the one-dimensional case, we can use the gradient
of a multidimensional function to carry out similar estimates for each of
the variables involved. Of course, in this case, if any of the partial derivatives
are zero, we will again have problems. Interpolation in multiple dimensions
avoids these difficulties but converges more slowly. For polynomials, the
higher the degree of the polynomial, the more difficult it is to extract
the roots correctly; small errors can send the estimates all over the place.
Programming around these difficulties is somewhat tedious and at any rate
is well beyond the scope of this introduction.
Systems of multiple equations in multiple unknowns are much more
difficult to solve. Indeed, there are no general-purpose methods which work
well in all cases. Press et al. has a very good discussion of the issues
involved. However, once we have (somehow) determined an area where a root
is known to lie, Newton's method generalized to the appropriate number of
dimensions works very well.
References:
Gerald, C. F., Wheatley, P. O. Applied Numerical Analysis. 6th
Edition. 1999, Addison Wesley.
Hanly, J. R.. Essential C++ for Engineers and Scientists. 2nd Edition.
2002, Addison Wesley.
Press, W.H., Flannery, B. P., Teukolsky, S. A., Vetterling, W. T.
Numerical
Recipes in Pascal: The Art of Scientific Computing. 1989, Cambridge University
Press.