Nonlinear Least Squares Data Fitting

appendix DNonlinear Least SquaresData IntroductionA Nonlinear Least Squares problem is an unconstrained minimization problem of theformminimizexf(x)=m i=1fi(x)2,where the objective function is defined in terms of auxiliary functions{fi}.Itis called Least Squares because we areminimizingthe sum ofsquaresof thesefunctions. Looked at in this way, it is just another example of unconstrained min-imization, leading one to ask why it should be studied as a separate topic. Thereare several the context of data Fitting , the auxiliary functions{fi}are not arbitrarynonlinear functions. They correspond to the residuals in a data Fitting problem (seeChapter 1). For example, suppose that we had collected data{(ti,yi)}mi=1consist-ing of the size of a population of antelope at various times. Hereticorresponds tothe time at which the populationyiwas counted. Suppose we had the datati:12458yi:3461120where the times are measured in years and the populations are measured in hun-dreds. It is common to model populations using exponential models, and so wemight hope thatyi x1ex2tifor appropriate choices of the parametersx1andx2.

A model of this type is illus-trated in Figure Least Squares were used to select the parameters (see Section ) then wewould solveminimizex1,x2f(x1,x2)=5 i=1(x1ex2ti yi)2743744 appendix D. Nonlinear Least Squares Data Fitting01234567891002468101214161820tyFi gure Model of Antelope Populationso that thei-th function would befi(x1,x2)=x1ex2ti yi,that is, it would be the residual for thei-th data point. Most Least Squares problemsare of this form, where the functionsfi(x) are residuals and where the indexiindicates the particular data point. This is one way in which Least Squares problemsare problems are also distinctive in the way that the solution isinterpreted. Least Squares problems usually incorporate some assumptions aboutthe errors in the model. For example, we might haveyi=x1ex2ti+ i,where the errors{ i}are assumed to arise from a single probability distribution,often the normal distribution. Associated with our model are the true parametersx1andx2, but each time we collect data and solve the Least - Squares problem weonly obtain estimates x1and x2of these true parameters.

After computing theseestimates, it is common to ask questions about the model such as: What boundscan we place on the values of the true parameters? Does the model adequately fitthe data? How sensitive are the parameters to changes in the data? And so for Least - Squares problems are also distinctive. This is a conse-quence of the special structure of the Hessian matrix for the Least - Squares objectivefunction. The Hessian in this case is the sum of two terms. The first only involvesthe gradients of the functions{fi}and so is easier to compute. The second in-volves the second derivatives, but is zero if the errors{ i}are all zero (that is, ifthe model fits the data perfectly). It is tempting to approximate the second term inthe Hessian, and many algorithms for Least Squares do this. Additional techniquesare used to deal with the first term in a computationally sensible Least - Squares problems were uncommon then even these justifications wouldnot be enough to justify our discussion here.

But they are not uncommon. Theyare one of the most widely encountered unconstrained optimization problems, andamply justify the attention given Nonlinear Least - Squares Data Nonlinear Least - Squares Data FittingLet us first examine the special form of the derivatives in a Least - Squares will write the problem asminimizexf(x)=12m i=1fi(x)2 12F(x)TF(x)whereFis the vector-valued functionF(x)=(f1(x)f2(x) fm(x)) have scaled the problem by12to make the derivatives less cluttered. Thecomponents of f(x) can be derived using the chain rule: f(x)= F(x)F(x). 2f(x) can be derived by differentiating this formula with respect toxj: 2f(x)= F(x) F(x)T+m i=1fi(x) 2fi(x).These are the formulas for the gradient and Hessian be the solution of the Least - Squares problem, and suppose that at thesolution,f(x ) = 0. Thenfi(x ) = 0 for alli, indicating that all the residuals arezero and that the model fits the data with no error. As a result,F(x )=0andhence f(x ) = 0, confirming that the first-order necessary condition is also follows that 2f(x )= F(x ) F(x )T,so that the Hessian at the solution is positive semi-definite, as expected.

If F(x )is a matrix of full rank then 2f(x ) is positive and Hessian. For the antelope data and model in Sec-tion ,F(x)= x1ex2t1 y1x1ex2t2 y2x1ex2t3 y3x1ex2t4 y4x1ex2t5 y5 = x1e1x2 3x1e2x2 4x1e4x2 6x1e5x2 11x1e8x2 20 .The formula for the Least - Squares objective function isf(x1,x2)=125 i=1(x1ex2ti yi)2=12F(x)TF(x).The gradient offis f(x1,x2)= 5 i=1(x1ex2ti yi)ex2ti5 i=1(x1ex2ti yi)x1tiex2ti .746 appendix D. Nonlinear Least Squares Data FittingThis can be rewritten as f(x1,x2)= ex2t1ex2t2ex2t3ex2t4ex2t5x1t1ex2t1x1t2ex 2t2x1t3ex2t3x1t4ex2t4x1t5ex2t5 x1ex2t1 y1x1ex2t2 y2x1ex2t3 y3x1ex2t4 y4x1ex2t5 y5 so that f(x1,x2)= F(x)F(x). The Hessian matrix is 2f(x)= F(x) F(x)T+ mi=1fi(x) 2fi(x)= ex2t1ex2t2ex2t3ex2t4ex2t5x1t1ex2t1x1t2ex 2t2x1t3ex2t3x1t4ex2t4x1t5ex2t5 ex2t1x1t1ex2t1ex2t2x1t2ex2t2ex2t3x1t3ex2 t3ex2t4x1t4ex2t4ex2t5x1t5ex2t5 +5 i=1(x1ex2ti yi) 0tiex2titiex2tix1t2iex2ti .Note that{ti}and{yi}are the data values for the model, whilex1andx2arethe variables in the (x ) = 0 then it is reasonable to expect thatF(x) 0 forx x , implyingthat 2f(x)= F(x) F(x)T+m i=1fi(x) 2fi(x) F(x) F(x) final formula only involves the first derivatives of the functions{fi}andsuggests that an approximation to the Hessian matrix can be found using only firstderivatives, at Least in cases where the model is a good fit to the data.

This ideais the basis for a number of specialized methods for Nonlinear Least Squares simplest of these methods, called theGauss-Newton methoduses this ap-proximation directly. It computes a search direction using the formula for Newton smethod 2f(x)p= f(x)but replaces the Hessian with this approximation F(x) F(x)Tp= F(x)F(x).In cases whereF(x ) = 0 and F(x ) is of full rank, the Gauss-Newton methodbehaves like Newton s method near the solution, but without the costs associatedwith computing second Gauss-Newton method can perform poorly when the residuals at thesolution are not small (that is, when the model does not fit the data well), orif the Jacobian ofFis not of full rank at the solution. Loosely speaking, in thesecases the Gauss-Newton method is using a poor approximation to the Hessian Nonlinear Least - Squares Data Fitting747 Example Method. We apply the Gauss-Newton method to anexponential model of the formyi x1ex2tiwith datat=(12458)Ty=( ) this example, the vectorywas chosen so that the model would be a good fit tothe data, and hence we would expect the Gauss-Newton method to perform muchlike Newton s method.

(In generalywill not be chosen, but will be part of the givendata for a problem.) We apply the Gauss-Newton method without a line search,using an initial guess that is close to the solution:x= .At this pointF(x)= and F(x)T= .Hence f(x)= F(x)F(x)= F(x) F(x)T= .The Gauss-Newton search direction is obtained by solving the linear system F(x) F(x)T= F(x)F(x)and sop= and the new estimate of the solution isx x+p= .(For simplicity, we do not use a line search here, although a practical method wouldrequire such a globalization strategy.) The complete iteration is given in Table the solution,x= .748 appendix D. Nonlinear Least Squares Data FittingTable Iteration (Ideal Data)kf(xk) f(xk) 02 1003 10214 10 32 10122 10 83 10 233 10 94 10 843 10 93 10 13 Sincef(x) 0, an approximate global solution has been found to the Least -squaresproblem. (The Least - Squares objective function cannot be negative.) In general, theGauss-Newton method is only guaranteed to find a local comparison, we now apply Newton s method to the same problem usingthe same initial guessx=.

At this point f(x)= and 2f(x)= .(This matrix is similar to the matrix used in the Gauss-Newton method.) Thesearch direction is the solution of 2f(x)p= f(x)so thatp= andx x+p= .The complete iteration is in Table The solution obtained is almost identicalto that obtained by the Gauss-Newton now consider the same modelyi x1ex2tibut with the datat=( )T,y=(346112046) corresponds to the antelope data of Section , but with an extraneous datapoint added. (This point is called anoutlier, since it is inconsistent with the otherdata points for this model.) In this case the exponential model will not be a Nonlinear Least - Squares Data Fitting749 Table Iteration (Ideal Data)kf(xk) f(xk) 02 1003 10211 10 15 10122 10 49 10 135 10 96 10 346 10 98 10 853 10 91 10 12fit to the data, so we would expect the performance of the Gauss-Newton methodto deteriorate. The runs corresponding to the initial guessx= are given in Table As expected, the Gauss-Newton method converges methods find the solutionx=.

The initial guess was close to the solution, so that the slow convergence of theGauss-Newton method was not due to a poor initial guess. Also, the final functionvalue is large, indicating that the model cannot fit the data well in this case. Thisis to be expected given that an outlier is other methods for Nonlinear Least - Squares can be interpreted as usingsome approximation to the second term in the formula for the Hessian matrixm i=1fi(x) 2fi(x).The oldest and simplest of these approximations ism i=1fi(x) 2fi(x) I,where 0 is some scalar. Then the search direction is obtained by solving thelinear system F(x) F(x)T+ I p= F(x)F(x).This is referred to as D. Nonlinear Least Squares Data FittingTable (left) and Newton Iterations (right); Data Setwith Outlierkf(xk) f(xk) 10 10 10 10 10 10 10 9kf(xk) f(xk) 10 10 10 13 The Levenberg-Marquardt method is often implemented in the context of atrust-region strategy (see Section ). If this is done then the search direction isobtained by minimizing a quadratic model of the objective function (based on theGauss-Newton approximation to the Hessian)minimizepQ(p)=f(x)+pT F(x)F(x)+12pT F(x) F(x)Tpsubject to the constraint p for some scalar >0.

This gives a steppthat satisfies the Levenberg-Marquardtformula for an appropriate 0. The scalar is determined indirectly by pickinga value of , as is described in Section The scalar can be chosen based onthe effectiveness of the Gauss-Newton approximation to the Hessian, and this canbe easier than choosing directly. An example illustrating a trust-region approachcan be found in the same the Gauss-Newton and Levenberg-Marquardt methods use an approx-imation to the Hessian 2f(x). If this approximation is not accurate then themethods will converge more slowly than Newton s method; in fact, they will con-verge at a linear approximations to the Hessian off(x) are also possible. For example,a quasi-Newton approximation tom i=1fi(x) 2fi(x)Exercises751could be is one other computational detail associated with the Gauss-Newtonmethod that should be mentioned. The formula for the search direction in a Gauss-Newton method F(x) F(x)Tp= F(x)F(x)is equivalent to the solution of a linear Least - Squares problem19minimizep F(x)Tp+F(x) 22= F(x)Tp+F(x) T F(x)Tp+F(x).

Nonlinear Least Squares Data Fitting

Tags:

Information

Advertisement

Transcription of Nonlinear Least Squares Data Fitting

Related search queries

Nonlinear Least Squares Data Fitting

Tags:

Information

Advertisement

Documents from same domain

Related documents

Related search queries