Transcription of Steepest Descent Method - PSU
1 Steepest Descent Method Kefu Liu Properties of Gradient Vector The gradient vector of a scalar function f ( x1 , x2 ,", xn ) is defined as a column vector T. f f f . f = " =c x1 x2 xn . For example f ( x1 , x2 ) = 25 x12 + x22. at the point x1* = .6, x2* = 4. 2(25) x1* 2(25)(.6) 30 . c = f = * = = . 2 x2 2(4) 8 . The normalized gradient vector c c=. cT c For example, at the point x1* = .6, x2* = 4. 1 30 .96625 . c= = . 302 + 82 8 .2577 . Property 1. The gradient vector represents a direction of maximum rate of increase for the function f (x) at x* . For example, f (.6, 4) = 25(.6)2 + 42 = 25. If we increase x in the direction c by a step size of = .5..6 .96625 . x (1) = x(0) + c = + .5 = . 4 .2577 . The function value becomes 1.
2 F (x(1) ) = 25( )2 + ( )2 = If we move in a direction [1 0]. T..6 1 . x(1) = x(0) + c = + .5 = . 4 0 4 . The function value becomes f (x(1) ) = 25( )2 + (4)2 = If we move in a direction [ 0 1]. T..6 0 .6 . x (1) = x(0) + c = + .5 = . 4 1 . The function value becomes f (x(1) ) = 25(.6)2 + ( )2 = We can see that moving along the gradient direction results in the maximum increase in the function. Property 2. The gradient vector c of f ( x1 , x2 ,", xn ) at the point x* is orthogonal (normal) to the tangent plane for the surface f (x* ) = constant . For example, f ( x1 , x2 ) = 25 x12 + x22 = 25 the slope at x1* = .6, x2* = 4 can be found to be 2(25) x1dx1 + 2 x2 dx2 = 0. dx2 25 x1 25(.6). slope = = = = dx1 x2 4. The direction of the tangent line is given by 1.
3 T= .. c and t are normal each other as 2. 1 . cT t = [30 8] = 30 8( ) = 0.. Property 3. The maximum rate of change of f (x) at any point x* is the magnitude of the gradient vector given by c = cT c Steepest Descent direction. Let f ( x) be a differentiable function with respect to x . The direction of Steepest Descent for f (x) at any point is d = c or d = c Example. Use the Steepest Descent direction to search for the minimum for f ( x1 , x2 ) = 25 x12 + x22 starting at x (0) = [1 3] with a step size of = .5 . The function T. value at the starting point is f (x(0) ) = 25(1)2 + 32 = 34. An analytical solution revealsthat the minimum point is at x* = [ 0 0] and f (x* ) = 0 . Let T. us start the process of iterations. 2(25)(1) 50 (0) 1 50.
4 9929 . c(0) = = , c = = . 2(3) 6 502 + 62 6 .1191 . 1 .9929 .5035 . x (1) = x (0) .5 c (0) = .5 = . 3 .1191 . f (x(1) ) = 25(.5035)2 + ( )2 = 2(25)(.5035) (1) .9738 . c(1) = = , c = .2275 . 2( ) ..5035 .9738 .0166 . x(2) = x(1) .5 c (1) = .5 = ..2275 . f (x(2) ) = 25(.0166)2 + ( )2 = .83 (2) .1453 . c(2) = , c = .9894 .. 3..0166 .1453 .0561 . x(3) = x(2) .5 c (2) = .5 = ..9894 . f (x(3) ) = 25( .0561)2 + ( )2 = (3) .5154 . c(3) = , c = .8570 ..0561 .5154 .2016 . x(4) = x (3) .5 c (3) = .5 = ..857 . f (x(4) ) = 25(.2016)2 + ( )2 = (4) .9355 . c(4) = , c = .3533 ..2016 .9355 . x (5) = x (4) .5 c (4) = .5 = ..3533 . f (x(5) ) = 25( .2662)2 + ( )2 = It is noted that the function values start to oscillate, , not monotonically reduce.
5 This is caused by the constant step size. When the search is near the minimum, a smaller step size should be used. Otherwise, an overshoot will occur. Overshoot means that we move along the Steepest direction more than needed. As a matter of fact, we are supposed to find the best step size at each iteration by conducting a one-D optimization in the Steepest Descent direction. For example, the new point can be expressed as a function of step size , , 1 .9929 1 .9929 . x (1) = x(0) c (0) = = . 3 .1191 3 .1191 . f ( x (1) ) = 25(1 .9929 ) 2 + (3 .1191 ) 2. f ( x (1) ) is a function of . Using the analytical approach, we get df ( ). = 2(25)(1 .9929 (0) )( .9929) + 2(3 .1191 (0) )( .1191) = 0. d . 4. 25(.9929) + 3(.1191). (0) = = 25(.)
6 99292 ) + .11912. 1 .9929 . x(1) = x(0) (0) c (0) = = . 3 .1191 . f ( x (1) ) = 25( .0139) 2 + 2 = f = is the minimum value we can find at this iteration. Steepest Descent algorithm Step 1. Estimate a starting design x (0) and set the iteration counter k = 0 . Select a convergence parameter > 0 . Step 2. Calculate the gradient of f (x) at the point x ( k ) as c( k ) = f (x ( k ) ) . Calculate c = cT c . If c < , then stop the iteration process as x* = x(k ) is a minimum point. Otherwise, go to Step 3. Step 3. Let the search direction at the current point x ( k ) as d ( k ) = c( k ) . Step 4. Calculate a step size ( k ) to minimize f (x ( k ) + ( k ) d ( k ) ) . A one-dimensional search is used to determine ( k ) . Step 5.
7 Update the design as x( k +1) = x( k ) + ( k ) d( k ) . Set k = k + 1 and go to Step 2. 5.