Transcription of Projected Gradient Algorithm
1 Projected Gradient AlgorithmAndersen AngMath ematique et recherche op erationnelleUMONS, draft: August 2, 2017 Last update : October 23, 2020 Overview1 Constrained and unconstrained problem2 Understanding the geometry of projection3 PGD is a special case of proximal gradient4 Theorem 1. An inequality of PGD with constant stepsize5 Theorem 2. PGD converges at orderO(1 k)on Lipschitz function6 Summary2 / 22 Constrained and unconstrained problemIFor unconstrained minimization problemminx Rnf(x),anyxinRncan be a constrained minimization problem with a given setQ Rnminx Qf(x),not anyxcan be a solution, the solution has to be inside the example of constrained minimization problem:minx Rn Ax b x 2 1can be expressed asmin x 2 1 Ax b / 22 Solving unconstrained problem by Gradient descentIGradient Descent(GD) is a standard (easy and simple) way to solveunconstrainedoptimization from an initial pointx0 Rn, GD iterates the followingequation until a stopping condition is met:xk+1=xk k f(xk),where fis the Gradient off, the parameter 0is the step size,andkis the iteration.
2 How aboutconstrainedproblem? Is it possible totuneGDto fit constrained problem?Answer: yes, and the key : Iffis not differentiable, we can replace Gradient by subgradient, and we get theso-called subgradient / 22 Solving constrained problem by Projected Gradient descentIProjected Gradient Descent(PGD) is a standard (easy and simple)way to solveconstrainedoptimization a constraint setQ Rn, starting from a initial pointx0 Q, PGD iterates the following equation until a stoppingcondition is met:xk+1=PQ(xk k f(xk)).IPQ(.)is the projection operator, and itself is also an optimizationproblem:PQ(x0) = arg minx Q12 x x0 22, given a pointx0,PQtry to find a pointx Qwhich is closest / 22 About the projectionIPQ(.)is a function fromRntoRn, and itself is an optimizationproblem:PQ(x0) = arg minx Q12 x x0 is an economic Algorithm if the problem is easy to is not true for generalQand there are lots of constraint setsthat are very difficult to project a convex set, the optimization problem has a unique nonconvex, the solution toPQ(x0)may not be unique: it givesmore than one / 22 Comparing PGD to GDIGD1.
3 Pick an initial pointx0 Rn2. Loop until stopping condition is Descent direction: pick the descent direction as f(xk) Stepsize: pick a step size Update:xk+1=xk k f(xk)IPGD1. Pick an initial pointx0 Q2. Loop until stopping condition is Descent direction: pick the descent direction as f(xk) Stepsize: pick a step size Update:yk+1=xk k f(xk) Projection:xk+1= argminx Q12 x yk+1 22 IPGD has one more step: the idea of PGD is simple: if the pointxk k f(xk)after thegradient update is leaving the setQ, project it / 22 Understanding the geometry of projection .. (1/5)Consider a convex setQand a pointx0 Q, the closest point tox0inQwill distance between a point to itself is :12 x x0 22= 0givesx= / 22 Understanding the geometry of projection.
4 (2/5)Now consider a convex setQand a pointx0/ Q: / 22 Understanding the geometry of projection .. (3/5)IThe circles areL2norm ball centered atx0with different on these circles areequidistanttox0(with differentL2distance on different circles).INote that some points on the blue circle are / 22 Understanding the geometry of projection .. (4/5)IThe point insideQwhich is closest tox0is the point where theL2norm ball touches this example, the blue pointyis the solution toPQ(x0) = argminx Q12 x x0 fact, it can be proved that, such point is always located on theboundaryofQforx0/ Q. That is, mathematically,argminx Q12 x x0 22 bdQifx0/ / 22 Understanding the geometry of projection .. (5/5)Note that the projection isorthogonal: the blue pointyis always on astraight line that is tangent to the norm ball / 22 PGD is a special case of proximal gradientIThe indicator function, denoted asi(x), of a setQis defined asfollows: ifx Q, theni(x) = 0; ifx/ Q, theni(x) =.
5 IWith the indicator function, constrained problem has two equivalentexpressionsminx Qf(x) minxf(x) +i(x).IProximal Gradient is a method to solve the optimization problem of asum of differentiable and a non-differentiable function:minxf(x) +g(x),wheregis a non-differentiable is in fact the special case of proximal Gradient whereg(x)is theindicator function of the constrain set. See here for more aboutproximal Gradient .13 / 22On PGD convergence rateITheorem 1. Iffis convex , PGD with constant stepsize satisfiesf(1K+ 1K k=0xk) f x0 x 222 (K+ 1)+ 2(K+ 1)K k=0 f(xk) 22,wheref =f(x )is the optimal cost value,x is the (global)minimizer, is the constant stepsize,Kis the total of number ofiteration of this theorem: the term1K+1 Kk=0xkis the average of the sequencexkafterKiteration, hence we can denoteit as xandf( x)as f.
6 Then the theorem reads: f f x0 x 222 (K+ 1)+something the convergence rate is likeO(1K).IFor the second term on the right hand side, as long as Kk=0 f(xk) 22is not diverging to infinity, or the growth of it isslower thanK, then the term 2(K+ 1) Kk=0 f(xk) / 22 Proof of theorem 1 .. (1/5)Ifis convex sof(x) f(z) f(x),x z .IPutx=xk,z=x andf(x ) =f :f(xk) f f(xk),xk x .IPGD updateyk+1=xk k f(xk)gives f(xk) =xk yk+1 kandf(xk) f 1 k xk yk+1,xk x .IAs we use constant stepsize:f(xk) f 1 xk yk+1,xk x .15 / 22 Proof of theorem 1 .. (2/5)IA trick(a b)(a c) =a2 ac ab+bc=2a2 2ac 2ab+ 2bc2=a2 2ac+a2 2ab+ 2bc+c2 c2+b2 b22=(a c)2+ (a b)2 (b c)22 IHencef(xk) f 1 xk yk+1,xk x =12 ( xk x 22+ xk yk+1 22 yk+1 x 22) =12 ( xk x 22 yk+1 x 22)+ 2 f(xk) 22where * is due to PGD updatexk yk+1= f(xk)16 / 22 Proof of theorem 1.
7 (3/5)Note that yk+1 x 22 xk+1 x Q(y) =xk+1x Hence yk+1 x 22 xk+1 x 22andf(xk) f 12 ( xk x 22 yk+1 x 22)+ 2 f(xk) 22 12 ( xk x 22 xk+1 x 22)+ 2 f(xk) 22It forms a telescoping series !17 / 22 Proof of theorem 1 .. (4/5)k= 0f(x0) f x0 x 22 x1 x 222 + 2 f(x0) 22k= 1f(x1) f x1 x 22 x2 x 222 + 2 f(x1) f(xk) f xk x 22 xK+1 x 222 + 2 f(xk) 22 Sums allK k=0(f(xk) f ) x0 x 22 xk+1 x 222 + 2K k=0 f(xk) / 22 Proof of theorem 1 .. (5/5)As0 12 xk+1 x 22,K k=0(f(xk) f ) x0 x 222 + 2K k=0 f(xk) the summation on the left and divide the whole equation byK+ 11K+ 1K k=0f(xk) f x0 x 222 (K+ 1)+ 2(K+ 1)K k=0 f(xk) the left hand side, asfis convex , by Jensen s inequalityf(1K+ 1K k=0xk) 1K+ 1K k=0f(xk).Thereforef(1K+ 1K k=0xk) f x0 x 222 (K+ 1)+ 2(K+ 1)K k=0 f(xk) / 22 PGD converges at orderO(1 k)on Lipschitz functionTheorem 2.
8 Iffis Lipschitz, for the point xK={1K+ 1K k=0xk}andconstant stepsize = x0 x L K+ 1we havef( xK) f L x0 x K+ 1 Proof. Put xK, into theorem 1 directly, note that f Lipschitz then fis bounded: f L, whereLis theLipschitz the stepsize , note that it isK(total number of step) notk(current iteration number).IThe stepsize requires to knowx , so this theorem is practicallyuseless as knowingx already solves the / 22 DiscussionIn the convergence analysis of convex and -smooth ( Gradient is -Lipschitz)2. Convergence rateO(1k).In the convergence analysis of convex andL-Lipschitz ( Gradient is bounded above)2. Convergence rateO(1 k).3. The convergence rate works on xKIffis convex and -smooth, the convergence of PGD will be the same asthat of convergence rate of PGD on convex and -smoothfwillalso beO(1k).
9 IHowever practically it depends on the complexity of the difficult to project / 22 Last page - summaryIPGD = GD + projectionIPGD with constant stepsize :f(1K+ 1K k=0xk) f x0 x 222 (K+ 1)+ 2(K+ 1)K k=0 f(xk) 22 IIffis Lipschitz (bounded Gradient ), for the point xK={1K+1 Kk=0xk}and constant step size = x0 x L K+1thenf( xK) f L x0 x K+ of document22 / 22