Transcription of METHODS FOR NON-LINEAR LEAST SQUARES PROBLEMS
1 IMMMETHODS FORNON-LINEAR LEASTSQUARES PROBLEMS2nd Edition, April 2004K. Madsen, Nielsen, O. TingleffInformatics and Mathematical ModellingTechnical University of DenmarkCONTENTS1. INTRODUCTION The Steepest Descent method .. Newton s Method .. Line Search .. Trust Region and Damped METHODS ..113. The Gauss Newton Method .. The Levenberg Marquardt Method .. Powell s Dog Leg Method .. A Hybrid Method: L M and Quasi Newton .. A Secant Version of the L M Method .. A Secant Version of the Dog Leg Method .. Final INTRODUCTION ANDDEFINITIONSIn this booklet we consider the following problem,Definition LEAST SQUARES ProblemFindx , a local minimizer for1)F(x)=12mXi=1(fi(x))2;wherefi:IRn7!I R;i=1;:::;mare given functions, andm important source of LEAST SQUARES PROBLEMS isdata consider thedata points(t1;y1);:::;(tm;ym)shown belowtyFigure pointsf(ti;yi)g(marked by+)and modelM(x;t)(marked by full line.
2 Further, we are given afitting model,M(x;t)=x3ex1t+x4ex2t:1)The factor12in the definition ofF(x)has no effect onx . It is introduced for conve-nience, see page INTRODUCTION ANDDEFINITIONS2 The model depends on theparametersx=[x1;x2;x3;x4]>. We assume thatthere exists anxyso thatyi=M(xy;ti)+"i;where thef"igare (measurement) errors on the data ordinates, assumed to be-have like white noise .For any choice ofxwe can compute theresidualsfi(x)=yi M(x;ti)=yi x3ex1ti x4ex2ti;i=1;:::;m:For aleast SQUARES fitthe parameters are determined as the minimizerx of thesum of squared residuals. This is seen to be a problem of the form in Defini-tion withn=4. The graph ofM(x ;t)is shown by full line in Figure LEAST SQUARES problem is a special variant of the more general problem:Given a functionF:IRn7!IR, find an argument ofFthat gives the minimumvalue of this so-calledobjective functionorcost Global MinimizerGivenF:IRn7!
3 IR. Findx+=argminxfF(x)g:This problem is very hard to solve in general, and we only present meth-ods for solving the simpler problem of finding a local minimizer forF,anargument vector which gives a minimum value ofFinside a certain regionwhose size is given by , where is a small, positive Local MinimizerGivenF:IRn7!IR. Findx so thatF(x ) F(x)forkx x k< :In the remainder of this introduction we shall discuss some basic concepts inoptimization, and Chapter 2 is a brief review of METHODS for finding a ANDDEFINITIONS minimizer for general cost functions. For more details we refer to Frandsenet al (2004). In Chapter 3 we give METHODS that are specially tuned for leastsquares assume that the cost functionFis differentiable and so smooth that thefollowingTaylor expansionis valid,2)F(x+h)=F(x)+h>g+12h>Hh+O(khk3);( )wheregis thegradient,g F0(x)=2666664@ @ theHessian,H F00(x)= @2F@xi@xj(x) :( )Ifx is a local minimizer andkhkis sufficiently small, then we cannot find apointx +hwith a smallerF-value.
4 Combining this observation with ( )we getTheorem Necessary condition for a local is a local minimizer, theng F0(x )=0:We use a special name for arguments that satisfy the necessary condition:Definition Stationary F0(xs)=0;thenxsis said to be astationary )Unless otherwise specified,k kdenotes the 2-norm,khk=qh21+ + INTRODUCTION ANDDEFINITIONS4 Thus, a local minimizer is also a stationary point, but so is a local stationary point which is neither a local maximizer nor a local minimizeris called asaddle point. In order to determine whether a given stationarypoint is a local minimizer or not, we need to include the second order termin the Taylor series ( ). Insertingxswe see thatF(xs+h)=F(xs)+12h>Hsh+O(khk3)withHs= F00(xs):( )From definition ( ) of the Hessian it follows that anyHis a symmetricmatrix. If we request thatHsispositive definite, then its eigenvalues aregreater than some number >0(see Appendix A), andh>Hsh> khk2:This shows that forkhksufficiently small the third term on the right-handside of ( ) will be dominated by the second.
5 This term is positive, so thatwe getTheorem Sufficient condition for a local thatxsis a stationary point and thatF00(xs)is positive a local definite, thenxsis a local maximizer. IfHsisindefinite(ieit has both positive and negative eigenvalues), thenxsis a saddle DESCENTMETHODSAll METHODS for NON-LINEAR optimization are iterative: From a starting pointx0the method produces a series of vectorsx1;x2;:::, which (hopefully)converges tox , a local minimizer for the given function, see Definition METHODS have measures which enforce thedescending conditionF(xk+1)<F(xk):( )This prevents convergence to a maximizer and also makes it less probablethat we converge towards a saddle point. If the given function has severalminimizers the result will depend on the starting pointx0. We do not knowwhich of the minimizers that will be found; it is not necessarily the mini-mizer closest many cases the method produces vectors which converge towards theminimizer in two clearly different stages.
6 Whenx0is far from the solutionwe want the method to produce iterates which move steadily towardsx .In this global stage of the iteration we are satisfied if the errors do notincrease except in the very first steps, iekek+1k<kekkfork>K;whereekdenotes the current error,ek=xk x :( )In the final stage of the iteration, wherexkis close tox , we want fasterconvergence. We distinguish betweenLinear convergence:kek+1k akekkwhenkekkis small;0<a<1;( )2. DESCENTMETHODS6 Quadratic convergence:kek+1k=O(kekk2)whenkekkis small;( )Superlinear convergence:kek+1k=kekk!0fork!1:( )The METHODS presented in this lecture note are descent METHODS which sat-isfy the descending condition ( ) in each step of the iteration. One stepfrom the current iterate consists in1. Find a descent directionhd(discussed below), and2. find a step length giving a good decrease in an outline of a descent method isAlgorithm Descent methodbegink:= 0;x:=x0;found:=falsefStarting pointgwhile(notfound)and(k<kmax)hd:=sear chdirection(x)fFromxand downhillgif(no suchhexists)found:=truefxis stationarygelse :=steplength(x;hd)ffromxin directionhdgx:=x+ hd;k:=k+1fnext iterategendConsider the variation of theF-value along the half line starting atxandwith directionh.
7 From the Taylor expansion ( ) we see thatF(x+ h)=F(x)+ h>F0(x)+O( 2)'F(x)+ h>F0(x)for sufficiently small.( )We say thathis adescent directionifF(x+ h)is a decreasing function of at =0. This leads to the following Descent a descent direction forFatxifh>F0(x)<0:If no suchhexists, thenF0(x)=0, showing that in this casexis , we have to choose , ie how far we should go fromxin thedirection given byhd, so that we get a decrease in the value of the objectivefunction. One way of doing this is to find (an approximation to) e=argmin >0fF(x+ h)g:( )The process is calledline search, and is discussed in Section First,however, we shall introduce two METHODS for computing a descent The Steepest Descent methodFrom ( ) we see that when we perform a step hwith positive , then therelative gain in function value satisfieslim !0F(x) F(x+ h) khk= 1khkh>F0(x)= kF0(x)kcos ;where is the angle between the vectorshandF0(x).
8 This shows that weget the greatest gain rate if = , ie if we use the steepest descent directionhsdgiven byhsd= F0(x):( )The method based on ( ) (iehd=hsdin Algorithm ) is called thesteep-est descent methodorgradient method. The choice of descent direction is the best (locally) and we could combine it with an exact line search ( ).A method like this converges, but the final convergence is linear and oftenvery slow. Examples in Frandsen et al (2004) show how the steepest descentmethod with exact line search and finite computer precision can fail to findthe minimizer of a second degree polynomial. For many PROBLEMS , however,the method has quite good performance in the initial stage of the Newton s Method8 Considerations like this has lead to the so-calledhybrid METHODS , which asthe name suggests are based on two different METHODS . One which is goodin the initial stage, like the gradient method, and another method which isgood in the final stage, like Newton s method; see the next section.
9 A majorproblem with a hybrid method is the mechanism which switches betweenthe two METHODS when Newton s MethodWe can derive this method from the condition thatx is a stationary to Definition it satisfiesF0(x )=0. This is a nonlinear sys-tem of equations, and from the Taylor expansionF0(x+h)=F0(x)+F00(x)h+O(khk2)'F 0(x)+F00(x)hforkhksufficiently smallwe deriveNewton s method:Findhnas the solutions toHhn= F0(x)withH=F00(x);( )and compute the next iterate byx:=x+hn:( )Suppose thatHis positive definite, then it is nonsingular (implying that( ) has a unique solution), andu>Hu>0for all nonzerou. Thus, bymultiplying withh>non both sides of ( ) we get0<h>nHhn= h>nF0(x);( )showing thathnis a descent direction: it satisfies the condition in Defini-tion s method is very good in the final stage of the iteration, wherexisclose tox . One can show (see Frandsen et al (2004)) that if the Hessianat the solution is positive definite (the sufficient condition in Theorem satisfied) and if we are at a position inside the region aroundx (x)is positive definite, then we get quadratic convergence (defined in( )).
10 On the other hand, ifxis in a region whereF00(x)is negative definiteeverywhere, and where there is a stationary point, the basic Newton method( ) would converge (quadratically) towards this stationary point, whichis a maximizer. We can avoid this by requiring that all steps taken are indescent can build a hybrid method, based on Newton s method and the steepestdescent method. According to ( ) the Newton step is guaranteed to bedownhill ifF00(x)is positive definite, so a sketch of the central section ofthis hybrid algorithm could beifF00(x)is positive definiteh:=hnelseh:=hsdx:=x+ h( )Here,hsdis the steepest descent direction and is found by line search; seeSection A good tool for checking a matrix for positive definiteness isCholesky s method (see Appendix A) which, when successful, is also usedfor solving the linear system in question. Thus, the check for definiteness isalmost for Section we introduce some METHODS , where the computation of thesearch directionhdand step length is done simultaneously, and give aversion of ( ) without line search.