A Brief Introduction to Root Finding


Why bother?

At root r of function f(x), f(r) = 0.
Most methods rely on starting with some initial estimate and successively improving it.
We can determine we have a root inside an interval if the function changes sign from one end of the interval to the other. Note that we may have two roots, or any even number of roots, within an interval and the "sign-change" indicator would not work. Likewise, if there is any odd number of roots, the sign will change. This is where understanding the function is important, and graphing the function may be helpful. No programming trick or numerical method can replace having a good understanding of your function.  

Bisection
    One of the most straightforward methods is bisection.
bisection illustration
Illustration from p. 13 of Gerald & Wheatley, credits below

Beginning with x1 and x2; each successive estimate halves the remaining interval where the root is located. Continue until two successive estimates are "close enough" or the most recent function evaluation is "close enough" to zero.

Interpolation Methods

Here's an example for some arbitrary function f(x). Pick two points x0 and x1, such that x1 is closer to the root (that is, | f(x1) | < | f(x0) | ), swapping them if necessary to accomplish this. We then extend the line running through f(x0) and f(x1) until it intersects the x axis:

Secant (basic method)
Illustration from p. 44 of Gerald and Wheatley. Full credits below.

From the similar triangles in the above illustration, you should be able to convince yourself that

Secant equations

And if we solve this equation for x2, we have

Secant Equation
Ref: Gerald & Wheatley, p. 44

This gives us the basic interpolating equation.

There are two methods which use this approach. The only difference between them is how they select points and how x2 is used when it is found. In the secant method, the two most recent points are used. In this case, x0 would be thrown out, x1 would become the 'new' x0, and x2 would replace x1. The procedure would then repeat with the generation of a new x2.

In the regula falsi, or false position method, we use x2 to replace either x0 or x1, keeping the root bracketed at all times. So instead of replacing the oldest estimate, x2 is used to replace whichever endpoint has the same sign.  The secant method does not require the root remain bracketed, and it is therefore slightly more vulnerable to certain pathologies. If the function is poorly behaved (e.g. has a local minimum or maximum) near the area being searched, it may go off in the wrong direction:
Secnat pathology
Gerald & Wheatley, p. 45

Again, understanding your function can help avoid this sort of problem.

The false position method, because it sometimes keeps "older" estimates, does not converge as quickly as the secant method. However, because it is strict about keeping the root bracketed, it is less vulverable to the secant method's pathology. Whether the tradeoff is worth the slower convergence depends on several things: How badly behaved the function is, how critical rapid convergence is, and how much overhead is involved in doing a function evaluation. If the function evaluation is especially complex, we may want to make as few as possible. On the other hand, if evaluating the function is simple, we may prefer the security of false position, as a few extra function evaluations (iterations of the algorithm) is a small price to pay. 

Newton's Method

Newton's method requires additional information about the function, namely the first derivative. Sometimes this is easy to obtain, sometimes not. Also, Newton's method should be used with caution because it is vulnerable to certain pathologies, mostly a local extremum (minimum or maximum) located near the root. Also, although Newton converges very rapidly near the root, its global convergence is very poor. Thus, one should use some other method (understanding of the function, inspection of the function's graph, etc) to at least get into the neighborhood of the root.  

Newton's method works by using the first derivative (instantaneous slope) to estimate where the root lies. The tangent line is extrapolated to intersect the x axis, which is taken as the next estimate:


newton diagram
Press et al., p. 288. Full reference below.


Convergence near root is very rapid: Quadratic convergence, meaning number of significant figures doubles each step.

The formula used is quite simple:

newton equation

We evaluate the function and its first derivative at our initial estimate, then use that information to generate the next estimate, repeating until our function evaluation is "small enough" or two successive estimates are "close enough" that we declare ourselves satisfied.

So why not use Newton all the time? We may not have an efficient evaluation of f'(x) available, or any evaluation of it available at all. Also, a local extremum (minimum or maximum) near the root can be the death of the method, because we divide by the value of the derivative, which is 0 at the extremum:

newton goof
Press et al., p. 288.


If we land on the extremum we divide by zero and our program crashes; if we land too close to the extremum, our next estimate sends us off on a wild-goose chase with little hope of finding our way back. Thus, most practical implementations do some sort of check on the magnitude of the derivative before carrying out the division. If the value of the derivative is too small, a safer method such as false position, or even simple bisection, is used instead. (This again requires some extra bookkeeping, as Newton doesn't require the root to be bracketed and needs only one estimate at a time.)

Other Methods  

Mueller's method is often used as a subsitute for one of the interpolation methods above. It uses three estimates and interpolates a second-degree equation (a parabola) to approximate the function near the root. Because the interpolated function is nonlinear, it converges much more rapidly than the linear methods mentioned here, and the computational cost, while higher, is still small enough to be worth it. Other methods such as spline interpolation are sometimes used for root finding; the basic idea (fitting a polynomial and using the root of the polynomial as the next estimate) remains the same.



Multidimensional Root Finding

Finding roots in multiple dimensions can be much more difficult. For example, if z = f(x, y), the roots will be coutours in the xy plane where z = 0. Mapping these out can be difficult, and finding them numerically even more so. However, if we know in advance where a root is likely to lie, the situation becomes more tractable. If there are several independent variables and only one dependent variable, the methods we've discussed so far generalize to multidimensions fairly easily. For instance, instead of the first derviative in the one-dimensional case, we can use the gradient of a multidimensional function to carry out similar estimates for each of the variables involved. Of course, in this case, if any of the partial derivatives are zero, we will again have problems. Interpolation in multiple dimensions avoids these difficulties but converges more slowly. For polynomials, the higher the degree of the polynomial, the more difficult it is to extract the roots correctly; small errors can send the estimates all over the place. Programming around these difficulties is somewhat tedious and at any rate is well beyond the scope of this introduction.

Systems of multiple equations in multiple unknowns are much more difficult to solve. Indeed, there are no general-purpose methods which work well in all cases. Press et al. has a very good discussion of the issues involved. However, once we have (somehow) determined an area where a root is known to lie, Newton's method generalized to the appropriate number of dimensions works very well.

References:  

Gerald, C. F., Wheatley, P. O. Applied Numerical Analysis. 6th Edition. 1999, Addison Wesley.

Hanly, J. R.. Essential C++ for Engineers and Scientists. 2nd Edition. 2002, Addison Wesley.

Press, W.H., Flannery, B. P., Teukolsky, S. A., Vetterling, W. T. Numerical Recipes in Pascal: The Art of Scientific Computing. 1989, Cambridge University Press.