Line Search Methods for Unconstrained Optimisation

line Search Methods forUnconstrained OptimisationLecture 8, Numerical Linear Algebra and OptimisationOxford University Computing Laboratory, MT 2007Dr Raphael Hauser Generic FrameworkFor the purposes of this lecture we consider the Unconstrained minimisationproblem(UCM) minx Rnf(x),wheref C1(Rn,R) with Lipschitz continous gradientg(x). In practice, these smoothness assumptions are sometimes violated,butthe algorithms we will develop are still observed to work well. The algorithms we will construct have the common feature that, startingfrom an initial educated guessx0 Rnfor a solution of (UCM), a sequenceof solutions (xk)N Rnis produced such thatxk x Rnsuch that the first and second order necessary optimality conditionsg(x ) = 0,H(x ) 0 (positive semidefiniteness)are satisfied. We usually wish to make progress towards solving (UCM) in every itera-tion, that is, we will constructxk+1so thatf(xk+1)< f(xk)(descent Methods ). In practice we cannot usually computex precisely ( , give a symbolicrepresentation of it, see the LP lecture!)

, but we have to stop with axksufficiently close tox . Optimality conditions are still useful, in that they serve as a stoppingcriterion when they are satisfied to within a predetermined error tolerance. Finally, we wish to construct (xk)Nsuch that convergence tox takesplace at a rapid rate, so that few iterations are needed until the stoppingcriterion is satisfied. This has to be counterbalanced with thecomputa-tional cost per iteration, as there typically is a tradeofffaster convergence higher computational cost per writefk=f(xk),gk=g(xk), andHk=H(xk).Generic line Search Method:1. Pick an initial iteratex0by educated guess, setk= Untilxkhas converged,i) Calculate asearch directionpkfromxk, ensuring that this directionis adescent direction, that is,[gk]Tpk<0 ifgk6= 0,so that for small enough steps away fromxkin the directionpktheobjective function will be ) Calculate a suitablesteplength k>0 so thatf(xk+ kpk)< computation of kis calledline Search , and this is usually aninner iterative ) Setxk+1=xk+ Methods differ from one another in how steps i) and ii) are a Step Length kThe challenges in finding a good kare both in avoiding thatthe step length is too long, 2 1 (the objective functionf(x) =x2and the iteratesxk+1=xk+ kpkgeneratedby the descent directionspk= ( 1)k+1and steps k= 2+3/2k+1fromx0= 2)or too short, 2 1 (the objective functionf(x) =x2and the iteratesxk+1=xk+ kpkgenerated by the descentdirectionspk= 1 and steps k= 1/2k+1fromx0= 2).

Exact line Search :In early days, kwas picked to minimize(ELS) min f(xk+ pk) usable, this method is not considered cost line Search Methods : Formulate a criterion that assures that steps are neither too long nor tooshort. Pick a good initial stepsize. Construct sequence of updates that satisfy the above criterion after veryfew line Search :1. Given init>0 ( , init= 1), let (0)= initandl= Untilf(xk+ (l)pk) < fk,i) set (l+1)= (l), where (0,1) is fixed ( , =12),ii) incrementlby Set k= (l).This method prevents the step from getting too small, but it does not preventsteps that are too long relative to the decrease improve the method, we need to tighten the requirementf(xk+ (l)pk) < prevent long steps relative to the decrease inf, we require theArmijoconditionf(xk+ kpk) f(xk) + k [gk]Tpkfor some fixed (0,1) ( , = or even = ).That is to say, we require that the achieved reduction iffbe at least a fixedfraction of the reduction promised by the first-oder Taylor line Search :1.

Given init>0 ( , init= 1), let (0)= initandl= Untilf(xk+ (l)pk) f(xk) + (l) [gk]Tpk,i) set (l+1)= (l), where (0,1) is fixed ( , =12),ii) incrementlby Set k= (l).Theorem 1(Termination of Backtracking-Armijo).Letf C1with gradientg(x)that is Lipschitz continuous with constant katxk, and letpkbe adescent direction atxk. Then, for fixed (0,1),i) the Armijo conditionf(xk+ pk) fk+ [gk]Tpkis satisfied for all [0, kmax], where kmax=2( 1)[gk]Tpk kkpkk22,ii) and furthermore, for fixed (0,1)the stepsize generated by the backtracking-Armijo line Search terminates with k min( init,2 ( 1)[gk]Tpk kkpkk22).We remark that in practice kis not known. Therefore, we cannot simplycompute kmaxand kvia the explicit formulas given by the theorem, and westill need the algorithm on the previous 2(Convergence of Generic LSM with B-A Steps).Let the gradientgoff C1be uniformly Lipschitz continuousonRn. Then, for the iterates generated by the Generic LineSearch Method with Backtracking-Armijo step lengths, one ofthe following situations occurs,i)gk= 0for some finitek,ii)limk fk= ,iii)limk min(|[gk]Tpk|, [gk]Tpk kpkk2)= a Search DirectionpkMethod of Steepest Descent:The most straight-forward choice of a Search direction,pk= gk, is calledsteepest-descentdirection.

Pkis a descent direction. pksolves the problemminp RnmLk(xk+p) =fk+ [gk] pkis cheap to method that uses the steepest-descent direction as a Search direction isamethod of steepest , it would seem thatpkis the best Search -direction one can find. Ifthat were true then much of Optimisation theory would not exist!Theorem 3(Global Convergence of Steepest Descent).Let thegradientgoff C1be uniformly Lipschitz continuous , for the iterates generated by the Generic LSM with B-Asteps and steepest-descent Search directions, one of the followingsituations occurs,i)gk= 0for some finitek,ii)limk fk= ,iii)limk gk= and disadvantages of steepest descent: Globally convergent (converges to a local minimiser from anystarting pointx0). Many other Methods switch to steepest descent when theydo not make sufficient progress. Not scale invariant (changing the inner product onRnchangesthe notion of gradient!). Convergence is usually very (very!) slow (linear).

Numerically, it is often not convergent at all. 1 for the objective functionf(x, y) = 10(y x2)2+ (x 1)2(Rosenbrock function),and the iterates generated by the generic line Search steepest-descent General Descent Methods :LetBkbe a symmetric, positive definite matrix, and define thesearch directionpkas the solution to the linear systemBkpk= gk pkis a descent direction, since[gk]Tpk= [gk]T[Bk] 1gk<0. pksolves the problemminp RnmQk(xk+p) =fk+ [gk]Tp+12pTBkp. pkcorresponds to the steepest descent direction if the normkxkBk:= xTBkxis used onRninstead of the canonical Euclidean norm. Thischange of metric can be seen as preconditioning that can bechosen so as to speed up the steepest descent method. If the HessianHkoffatxkis positive definite, andBk=Hk,this isNewton s method. IfBkchanges at every iteratexk, a method based on thesearch directionpkis calledvariable metricmethod. In par-ticular, Newton s method is a variable metric 4(Global Convergence of More General Descent Di-rection Methods ).

Let the gradientgoff C1be uniformlyLipschitz continuous onRn. Then, for the iterates generated bythe Generic LSM with B-A steps and Search directions definedbyBkpk= gk, one of the following situations occurs,i)gk= 0for some finitek,ii)limk fk= ,iii)limk gk= 0,provided that the eigenvalues ofBkare uniformly bounded above,and uniformly bounded away from 5(Local Convergence of Newton s Method).Let the HessianHoff C2be uniformly Lipschitz continuous onRn. Let iteratesxkbe generatedvia the Generic LSM with B-A steps using init= 1and <12, and usingthe Newton Search directionnk, defined byHknk= gk. If(xk)Nhas anaccumulation pointx whereH(x ) 0(positive definite) theni) k= 1for allklarge enough,ii)limk xk=x ,iii) the sequence converges Q-quadratically, that is, there exists >0suchthatlimk kxk+1 x kkxk x k2 .The mechanism that makes Theorem 5 work is that once the sequence(xk)Nenters a certain domain of attraction ofx , it cannot escape againand quadratic convergence tox that this is only a local convergence result, that is, Newton s method isnot guaranteed to converge to a local minimiser from all starting fast convergence of Newton s method becomes apparentwhen we apply it to the Rosenbrock function: 1 for the objective functionf(x, y) = 10(y x2)2+(x 1)2, and the iterates generatedby the Generic Linesearch Newton Newton Methods :The use ofBk=Hkmakes only sense at iteratesxkwhereHk 0.

Since this is usually not guaranteed to always be thecase, we modify the method as follows, ChooseMk 0 so thatHk+Mkis sufficiently positivedefinite, withMk= 0 ifHkitself is sufficiently positive defi-nite. SetBk=Hk+Mkand solveBkpk= termMkis typically chosen as one of the following, IfHkhas the spectral decompositionHk=Qk k[Qk]T, thenHk+Mk=Qkmax( I,|Dk|)[Qk]T. Mk= max(0, min(Hk))I. Modified Cholesky method:1. Compute a factorisationP HkPT=LBLT, wherePis a permutationmatrix,La unit lower triangular matrix, andBa block diagonal matrixwith blocks of size 1 or Choose a matrixFsuch thatB+Fis sufficiently positive LetHk+Mk=PTL(B+F) Modifications of Newton s Method:1. Build a cheap approximationBktoHk: Quasi-Newton approximation (BFGS, SR1, etc.), or use finite-difference Instead of solvingBkpk= gkforpk, ifBk 0 approximatelysolve the convex quadratic programming problem(QP)pk arg minp Rnfk+pTgk+ conjugate gradient method is a good solver for step 2:1.

Setp(0)= 0,g(0)=gk,d(0)= gk, andi= Untilg(i)is sufficiently small ori=n, repeati) (i)=kg(i)k22[d(i)]TBkd(i),ii)p(i+1)=p(i) + (i)d(i),iii)g(i+1)=g(i)+ (i)Bkd(i),iv) (i)=kg(i+1)k22kg(i)k22,v)d(i+1)= g(i+1)+ (i)d(i),vi) incrementiby Outputpk p(i).Important features of the conjugate gradient method: [gk]Tp(i)<0 for alli, that is, the algorithm always stops witha descent direction as an approximation topk. Each iteration is cheap, as it only requires the computationof matrix-vector and vector-vector products. Usually,p(i)is a good approximation ofpkwell beforei=n.

Line Search Methods for Unconstrained Optimisation

Tags:

Information

Transcription of Line Search Methods for Unconstrained Optimisation

Related search queries

Line Search Methods for Unconstrained Optimisation

Tags:

Information

Documents from same domain

Related documents

Related search queries