Transcription of 6.252 NONLINEAR PROGRAMMING LECTURE 4 …
1 NONLINEAR PROGRAMMINGLECTURE 4 convergence ANALYSIS OF GRADIENT METHODSLECTURE OUTLINE Gradient Methods - Choice of Stepsize Gradient Methods - convergence IssuesCHOICES OF STEPSIZE I Minimization Rule: kis such thatf(xk+ kdk) = min 0f(xk+ dk). Limited Minimization Rule: Min over [0,s] Armijo rule: f(xk)'dk f(xk)'dk0 Set of AcceptableStepsizes s sUnsuccessful StepsizeTrials 2sStepsize k = f(xk + dk) - f(xk) Start withsand continue with s, 2s, .., until msfallswithin the set of withf(xk) f(xk+ dk) f(xk) OF STEPSIZE II Constant stepsize: kis such that k=s:a constant Diminishing stepsize: k 0but satisfies the infinite travel condition k=0 k= GRADIENT METHODS WITH ERRORSxk+1=xk k( f(xk)+ek)whereekis an uncontrollable error vector Several special cases: eksmall relative to the gradient; , for allk, ek < f(xk) f(xk)ekgkIllustration of the descentproperty of the directiongk= f(xk)+ek. {ek}is bounded, , for allk, ek ,where is some scalar.
2 {ek}is proportional to the stepsize, , forallk, ek q k,whereqis some scalar. {ek}areindependentzeromeanrandomvec-tors convergence ISSUES Only convergence to stationary points can beguaranteed Even convergence to a single limit may be hardto guarantee (capture theorem) Danger of nonconvergence if directionsdktendto be orthogonal to f(xk) Gradient related condition:For any subsequence{xk}k Kthat converges toa nonstationary point, the corresponding subse-quence{dk}k Kis bounded and satisfieslim supk ,k K f(xk) dk<0. Satisfied ifdk= Dk f(xk)and the eigenval-ues ofDkare bounded above and bounded awayfrom zeroCONVERGENCE RESULTSCONSTANT AND DIMINISHING STEPSIZESLet{xk}be a sequence generated by a gradientmethodxk+1=xk+ kdk, where{dk}is gradientrelated. Assume that for some constantL>0,we have f(x) f(y) L x y , x, y n,Assume that either(1) there exists a scalar such that for allk0< k (2 )| f(xk) dk|L dk 2or(2) k 0and k=0 k= .Then eitherf(xk) or else{f(xk)}con-verges to a finite value and f(xk) PROOF IDEA0 f(xk)'dk + (1/2) 2L||dk||2 f(xk)'dk = | f(xk)'dk|L||dk|||2 f(xk + dk) - f(xk) The idea of the convergence proof for a constant the descent directiondk, the cost differ-encef(xk+ dk) f(xk) is majorized by f(xk) dk+12 2L dk 2(based on the Lipschitz assumption; see nextslide).
3 Minimization of this function over yields the step-size =| f(xk) dk|L dk 2 This stepsize reduces the cost functionfas LEMMALet be a scalar and letg( )=f(x+ y).Havef(x+y) f(x)=g(1) g(0) = 10dgd ( )d = 10y f(x+ y)d 10y f(x)d + 10y f(x+ y) f(x) d 10y f(x)d + 10 y f(x+ y) f(x) d y f(x)+ y 10L y d =y f(x)+L2 y RESULT ARMIJO RULELet{xk}be generated byxk+1=xk+ kdk, where{dk}is gradient related and kis chosen by theArmijo rule. Then every limit point of{xk}is Outline:Assumexis a nonstationary limitpoint. Thenf(xk) f(x),so k f(xk) dk 0. If{xk}K x,lim supk ,k K f(xk) dk<0,by gradient relatedness, so that{ k}K 0. By the Armijo rule, for largek Kf(xk) f xk+( k/ )dk < ( k/ ) f(xk) dk and k= k dk ,wehavef(xk) f(xk+ kpk) k< f(xk) the Mean Value Theorem and letk .We get f(x) p f(x) p,wherepis a limitpoint ofpk a contradiction since f(x) p<0.