Example: bachelor of science

Projected Gradient Algorithm

Projected Gradient AlgorithmAndersen AngMath ematique et recherche op erationnelleUMONS, draft: August 2, 2017 Last update : October 23, 2020 Overview1 Constrained and unconstrained problem2 Understanding the geometry of projection3 PGD is a special case of proximal gradient4 Theorem 1. An inequality of PGD with constant stepsize5 Theorem 2. PGD converges at orderO(1 k)on Lipschitz function6 Summary2 / 22 Constrained and unconstrained problemIFor unconstrained minimization problemminx Rnf(x),anyxinRncan be a constrained minimization problem with a given setQ Rnminx Qf(x),not anyxcan be a solution, the solution has to be inside the example of constrained minimization problem:minx Rn Ax b x 2 1can be expressed asmin x 2 1 Ax b / 22 Solving unconstrained problem by Gradient descentIGradient Descent(GD) is a standard (easy and simple) way to solveunconstrainedoptimization from an initial pointx0 Rn, GD iterates the followingequation until a stopping condition is met:xk+1=xk k f(xk),where fis the Gradient off, the parameter 0is the step size,andkis the iteration.

Oct 23, 2020 · Q(:) is a function from Rnto Rn, and itself is an optimization problem: P Q(x 0) = argmin x2Q 1 2 kx x 0k2 2: I PGD is an \economic" algorithm if the problem is easy to solve. This is not true for general Qand there are lots of constraint sets that are very di cult to project onto. I If Qis a convex set, the optimization problem has a unique ...

Tags:

  Projected, Algorithm, Optimization, Convex, Derating, Projected gradient algorithm

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of Projected Gradient Algorithm

1 Projected Gradient AlgorithmAndersen AngMath ematique et recherche op erationnelleUMONS, draft: August 2, 2017 Last update : October 23, 2020 Overview1 Constrained and unconstrained problem2 Understanding the geometry of projection3 PGD is a special case of proximal gradient4 Theorem 1. An inequality of PGD with constant stepsize5 Theorem 2. PGD converges at orderO(1 k)on Lipschitz function6 Summary2 / 22 Constrained and unconstrained problemIFor unconstrained minimization problemminx Rnf(x),anyxinRncan be a constrained minimization problem with a given setQ Rnminx Qf(x),not anyxcan be a solution, the solution has to be inside the example of constrained minimization problem:minx Rn Ax b x 2 1can be expressed asmin x 2 1 Ax b / 22 Solving unconstrained problem by Gradient descentIGradient Descent(GD) is a standard (easy and simple) way to solveunconstrainedoptimization from an initial pointx0 Rn, GD iterates the followingequation until a stopping condition is met:xk+1=xk k f(xk),where fis the Gradient off, the parameter 0is the step size,andkis the iteration.

2 How aboutconstrainedproblem? Is it possible totuneGDto fit constrained problem?Answer: yes, and the key : Iffis not differentiable, we can replace Gradient by subgradient, and we get theso-called subgradient / 22 Solving constrained problem by Projected Gradient descentIProjected Gradient Descent(PGD) is a standard (easy and simple)way to solveconstrainedoptimization a constraint setQ Rn, starting from a initial pointx0 Q, PGD iterates the following equation until a stoppingcondition is met:xk+1=PQ(xk k f(xk)).IPQ(.)is the projection operator, and itself is also an optimizationproblem:PQ(x0) = arg minx Q12 x x0 22, given a pointx0,PQtry to find a pointx Qwhich is closest / 22 About the projectionIPQ(.)is a function fromRntoRn, and itself is an optimizationproblem:PQ(x0) = arg minx Q12 x x0 is an economic Algorithm if the problem is easy to is not true for generalQand there are lots of constraint setsthat are very difficult to project a convex set, the optimization problem has a unique nonconvex, the solution toPQ(x0)may not be unique: it givesmore than one / 22 Comparing PGD to GDIGD1.

3 Pick an initial pointx0 Rn2. Loop until stopping condition is Descent direction: pick the descent direction as f(xk) Stepsize: pick a step size Update:xk+1=xk k f(xk)IPGD1. Pick an initial pointx0 Q2. Loop until stopping condition is Descent direction: pick the descent direction as f(xk) Stepsize: pick a step size Update:yk+1=xk k f(xk) Projection:xk+1= argminx Q12 x yk+1 22 IPGD has one more step: the idea of PGD is simple: if the pointxk k f(xk)after thegradient update is leaving the setQ, project it / 22 Understanding the geometry of projection .. (1/5)Consider a convex setQand a pointx0 Q, the closest point tox0inQwill distance between a point to itself is :12 x x0 22= 0givesx= / 22 Understanding the geometry of projection.

4 (2/5)Now consider a convex setQand a pointx0/ Q: / 22 Understanding the geometry of projection .. (3/5)IThe circles areL2norm ball centered atx0with different on these circles areequidistanttox0(with differentL2distance on different circles).INote that some points on the blue circle are / 22 Understanding the geometry of projection .. (4/5)IThe point insideQwhich is closest tox0is the point where theL2norm ball touches this example, the blue pointyis the solution toPQ(x0) = argminx Q12 x x0 fact, it can be proved that, such point is always located on theboundaryofQforx0/ Q. That is, mathematically,argminx Q12 x x0 22 bdQifx0/ / 22 Understanding the geometry of projection .. (5/5)Note that the projection isorthogonal: the blue pointyis always on astraight line that is tangent to the norm ball / 22 PGD is a special case of proximal gradientIThe indicator function, denoted asi(x), of a setQis defined asfollows: ifx Q, theni(x) = 0; ifx/ Q, theni(x) =.

5 IWith the indicator function, constrained problem has two equivalentexpressionsminx Qf(x) minxf(x) +i(x).IProximal Gradient is a method to solve the optimization problem of asum of differentiable and a non-differentiable function:minxf(x) +g(x),wheregis a non-differentiable is in fact the special case of proximal Gradient whereg(x)is theindicator function of the constrain set. See here for more aboutproximal Gradient .13 / 22On PGD convergence rateITheorem 1. Iffis convex , PGD with constant stepsize satisfiesf(1K+ 1K k=0xk) f x0 x 222 (K+ 1)+ 2(K+ 1)K k=0 f(xk) 22,wheref =f(x )is the optimal cost value,x is the (global)minimizer, is the constant stepsize,Kis the total of number ofiteration of this theorem: the term1K+1 Kk=0xkis the average of the sequencexkafterKiteration, hence we can denoteit as xandf( x)as f.

6 Then the theorem reads: f f x0 x 222 (K+ 1)+something the convergence rate is likeO(1K).IFor the second term on the right hand side, as long as Kk=0 f(xk) 22is not diverging to infinity, or the growth of it isslower thanK, then the term 2(K+ 1) Kk=0 f(xk) / 22 Proof of theorem 1 .. (1/5)Ifis convex sof(x) f(z) f(x),x z .IPutx=xk,z=x andf(x ) =f :f(xk) f f(xk),xk x .IPGD updateyk+1=xk k f(xk)gives f(xk) =xk yk+1 kandf(xk) f 1 k xk yk+1,xk x .IAs we use constant stepsize:f(xk) f 1 xk yk+1,xk x .15 / 22 Proof of theorem 1 .. (2/5)IA trick(a b)(a c) =a2 ac ab+bc=2a2 2ac 2ab+ 2bc2=a2 2ac+a2 2ab+ 2bc+c2 c2+b2 b22=(a c)2+ (a b)2 (b c)22 IHencef(xk) f 1 xk yk+1,xk x =12 ( xk x 22+ xk yk+1 22 yk+1 x 22) =12 ( xk x 22 yk+1 x 22)+ 2 f(xk) 22where * is due to PGD updatexk yk+1= f(xk)16 / 22 Proof of theorem 1.

7 (3/5)Note that yk+1 x 22 xk+1 x Q(y) =xk+1x Hence yk+1 x 22 xk+1 x 22andf(xk) f 12 ( xk x 22 yk+1 x 22)+ 2 f(xk) 22 12 ( xk x 22 xk+1 x 22)+ 2 f(xk) 22It forms a telescoping series !17 / 22 Proof of theorem 1 .. (4/5)k= 0f(x0) f x0 x 22 x1 x 222 + 2 f(x0) 22k= 1f(x1) f x1 x 22 x2 x 222 + 2 f(x1) f(xk) f xk x 22 xK+1 x 222 + 2 f(xk) 22 Sums allK k=0(f(xk) f ) x0 x 22 xk+1 x 222 + 2K k=0 f(xk) / 22 Proof of theorem 1 .. (5/5)As0 12 xk+1 x 22,K k=0(f(xk) f ) x0 x 222 + 2K k=0 f(xk) the summation on the left and divide the whole equation byK+ 11K+ 1K k=0f(xk) f x0 x 222 (K+ 1)+ 2(K+ 1)K k=0 f(xk) the left hand side, asfis convex , by Jensen s inequalityf(1K+ 1K k=0xk) 1K+ 1K k=0f(xk).Thereforef(1K+ 1K k=0xk) f x0 x 222 (K+ 1)+ 2(K+ 1)K k=0 f(xk) / 22 PGD converges at orderO(1 k)on Lipschitz functionTheorem 2.

8 Iffis Lipschitz, for the point xK={1K+ 1K k=0xk}andconstant stepsize = x0 x L K+ 1we havef( xK) f L x0 x K+ 1 Proof. Put xK, into theorem 1 directly, note that f Lipschitz then fis bounded: f L, whereLis theLipschitz the stepsize , note that it isK(total number of step) notk(current iteration number).IThe stepsize requires to knowx , so this theorem is practicallyuseless as knowingx already solves the / 22 DiscussionIn the convergence analysis of convex and -smooth ( Gradient is -Lipschitz)2. Convergence rateO(1k).In the convergence analysis of convex andL-Lipschitz ( Gradient is bounded above)2. Convergence rateO(1 k).3. The convergence rate works on xKIffis convex and -smooth, the convergence of PGD will be the same asthat of convergence rate of PGD on convex and -smoothfwillalso beO(1k).

9 IHowever practically it depends on the complexity of the difficult to project / 22 Last page - summaryIPGD = GD + projectionIPGD with constant stepsize :f(1K+ 1K k=0xk) f x0 x 222 (K+ 1)+ 2(K+ 1)K k=0 f(xk) 22 IIffis Lipschitz (bounded Gradient ), for the point xK={1K+1 Kk=0xk}and constant step size = x0 x L K+1thenf( xK) f L x0 x K+ of document22 / 22


Related search queries