Example: stock market

The Levenberg-Marquardt algorithm for nonlinear least ...

The Levenberg-Marquardt algorithm fornonlinear least squares curve-fitting problems Henri P. GavinDepartment of Civil and Environmental EngineeringDuke UniversitySeptember 18, 2020 AbstractThe Levenberg-Marquardt algorithm was developed in the early 1960 s to solvenonlinear least squares problems. least squares problems arise in the context of fit-ting a parameterized mathematical model to a set of data points by minimizing anobjective expressed as the sum of the squares of the errors between the model functionand a set of data points. If a model is linear in its parameters, the least squares ob-jective is quadratic in the parameters. This objective may be minimized with respectto the parameters in one step via the solution to a linear matrix equation. If the fitfunction is not linear in its parameters, the least squares problem requires an itera-tive solution algorithm . Such algorithms reduce the sum of the squares of the errorsbetween the model function and the data points through a sequence of well-chosenupdates to values of the model parameters.

4 The Levenberg-Marquardt algorithm for nonlinear least squares If in an iteration ρ i(h) > 4 then p+h is sufficiently better than p, p is replaced by p+h, and λis reduced by a factor.Otherwise λis increased by a factor, and the algorithm proceeds to the next iteration. 4.1.1 Initialization and update of the L-M parameter, λ, and the parameters p In lm.m users may select one of three ...

Tags:

  Levenberg, Marquardt, Levenberg marquardt

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of The Levenberg-Marquardt algorithm for nonlinear least ...

1 The Levenberg-Marquardt algorithm fornonlinear least squares curve-fitting problems Henri P. GavinDepartment of Civil and Environmental EngineeringDuke UniversitySeptember 18, 2020 AbstractThe Levenberg-Marquardt algorithm was developed in the early 1960 s to solvenonlinear least squares problems. least squares problems arise in the context of fit-ting a parameterized mathematical model to a set of data points by minimizing anobjective expressed as the sum of the squares of the errors between the model functionand a set of data points. If a model is linear in its parameters, the least squares ob-jective is quadratic in the parameters. This objective may be minimized with respectto the parameters in one step via the solution to a linear matrix equation. If the fitfunction is not linear in its parameters, the least squares problem requires an itera-tive solution algorithm . Such algorithms reduce the sum of the squares of the errorsbetween the model function and the data points through a sequence of well-chosenupdates to values of the model parameters.

2 The Levenberg-Marquardt algorithm com-bines two numerical minimization algorithms: the gradient descent method and theGauss-Newton method. In the gradient descent method, the sum of the squared er-rors is reduced by updating the parameters in the steepest-descent direction. In theGauss-Newton method, the sum of the squared errors is reduced by assuming the leastsquares function is locally quadratic in the parameters, and finding the minimum ofthis quadratic. The Levenberg-Marquardt method acts more like a gradient-descentmethod when the parameters are far from their optimal value, and acts more like theGauss-Newton method when the parameters are close to their optimal value. This doc-ument describes these methods and illustrates the use of software to solve nonlinearleast squares curve-fitting fitting a model function y(t;p) of an independent variabletand a vector ofnparameterspto a set ofmdata points (ti,yi), it is customary and convenient to minimizethe sum of the weighted squares of the errors (or weighted residuals) between the datayiand the curve-fit function y(t;p).

3 2(p) =m i=1[y(ti) y(ti;p) yi]2(1)= (y y(p))TW(y y(p))(2)=yTW y 2yTW y+ yTW y(3)where yiis the measurement error for datumy(ti). Typically the weighting matrixWisdiagonal withWii= 1/ 2yi. More formally,Wcan be set to the inverse of the measurement2 The Levenberg-Marquardt algorithm for nonlinear least squareserror covariance matrix, in the unusual case that it is known. More generally, the weightsWii, can be set to pursue other curve-fitting goals. This scalar-valued goodness-of-fit measureis called thechi-squared error criterionbecause the sum of squares of normally-distributedvariables is distributed as the chi-squared the function y(t;p) is nonlinear in the model parametersp, then the minimization of 2with respect to the parameters must be carried out iteratively. The goal of each iterationis to find a perturbationhto the parameterspthat reduces Gradient Descent MethodThe steepest descent method is a general minimization method which updates parame-ter values in the downhill direction: the direction opposite to the gradient of the objectivefunction.

4 The gradient descent method converges well for problems with simple objectivefunctions [6, 7]. For problems with thousands of parameters, gradient descent methods aresometimes the only viable gradient of the chi-squared objective function with respect to the parameters is p 2= 2(y y(p))TW p(y y(p))(4)= 2(y y(p))TW[ y(p) p](5)= 2(y y)TW J(6)where them nJacobian matrix [ y/ p] represents the local sensitivity of the function yto variation in the parametersp. For notational simplicity the variableJwill be usedfor [ y/ p]. Note that in models that are linear in the parameters, y=Xp, the Jacobian[ y/ p] is the matrix of model basis vectorsX. The parameter updatehthat moves theparameters in the direction of steepest descent is given byhgd= JTW(y y),(7)where the positive scalar determines the length of the step in the steepest-descent Gauss-Newton MethodThe Gauss-Newton method is a method for minimizing a sum-of-squares objective func-tion.

5 It presumes that the objective function is approximately quadratic in the parametersnear the optimal solution [2]. For moderately-sized problems the Gauss-Newton methodtypically converges much faster than gradient-descent methods [8].The function evaluated with perturbed model parameters may be locally approximatedthrough a first-order Taylor series expansion. y(p+h) y(p) +[ y p]h= y+Jh,(8) Gavin September 18, Gavin3 Substituting the approximation y(p+h) y(p) +Jhinto equation (3) for 2(p+h), 2(p+h) yTW y+ yTW y 2yTW y 2(y y)TW Jh+hTJTW Jh.(9)The first-order Taylor approximation (8) results in an approximation for 2that is quadraticin the perturbationh. The Hessian of the chi-squared fit criterion is approximatelyJTW parameter updatehthat minimizes 2is found from 2/ h= 0: h 2(p+h) 2(y y)TW J+ 2hTJTW J,(10)and the resulting normal equations for the Gauss-Newton update are[JTW J]hgn=JTW(y y).(11)Note that the right hand side vectors in normal equations for the gradient descent method(7) and the Gauss-Newton method (11) are Levenberg-Marquardt MethodThe Levenberg-Marquardt algorithm adaptively varies the parameter updates betweenthe gradient descent update and the Gauss-Newton update,[JTW J+ I]hlm=JTW(y y),(12)where small values of thedamping parameter result in a Gauss-Newton update and largevalues of result in a gradient descent update.

6 The damping parameter is initializedto be large so that first updates are small steps in the steepest-descent direction. If anyiteration happens to result in a worse approximation ( 2(p+hlm)> 2(p)), then isincreased. Otherwise, as the solution improves, is decreased, the levenberg -Marquardtmethod approaches the Gauss-Newton method, and the solution typically accelerates to thelocal minimum [6, 7, 8].In marquardt s update relationship [8], the values of are normalized to the values ofJTW J.[JTW J+ diag(JTW J)]hlm=JTW(y y),(13) ImplementationMany variations of the Levenberg-Marquardt have been published in papers and incode, , [4, 6, 10, 11]. This document borrows from some of these. In iterationi, thestephis evaluated by comparing 2(p) to 2(p+h). The step is accepted if the metric i[9] is greater than a user-specified threshold, 4>0. This metric is a measure of theactual improvement in 2as compared to the improvement of an LM update assuming theapproximation (8) were exact.

7 I(hlm) = 2(p) 2(p+hlm)(y y)TW(y y) (y y Jhlm)TW(y y Jhlm)(14)= 2(p) 2(p+hlm)hTlm( ihlm+JTW(y y(p)))if using eq n (12) forhlm(15)= 2(p) 2(p+hlm)hTlm( idiag(JTWJ)hlm+JTW(y y(p)))if using eq n (13) forhlm(16) Gavin September 18, 20204 The Levenberg-Marquardt algorithm for nonlinear least squaresIf in an iteration i(h)> 4thenp+his sufficiently better thanp,pis replaced byp+h, and is reduced by a factor. Otherwise is increased by a factor, and the algorithm proceedsto the next Initialization and update of the L-M parameter, , and the may select one of three methods for initializing and updating 0= o; ois user-specified [8].use eq n (13) forhlmand eq n (16) for if i(h)> 4:p p+h; i+1= max[ i/L ,10 7];otherwise: i+1= min [ iL ,107];2. 0= omax[diag[JTW J]]; ois eq n (12) forhlmand eq n (15) for =((JTW(y y(p)))Th)/(( 2(p+h) 2(p))/2 + 2(JTW(y y(p)))Th);if i( h)> 4:p p+ h; i+1= max [ i/(1 + ),10 7];otherwise: i+1= i+| 2(p+ h) 2(p)|/(2 );3.

8 0= omax[diag[JTW J]]; ois user-specified [9].use eq n (12) forhlmand eq n (15) for if i(h)> 4:p p+h; i+1= imax [1/3,1 (2 i 1)3] ; i= 2;otherwise: i+1= i i; i+1= 2 i;For the examples in section , method 1 [8] withL 11 andL 9 exhibits goodconvergence Computation and rank-1 update of the Jacobian,[ y/ p]In the first iteration, in every 2niterations, and in iterations where 2(p+h)> 2(p),the Jacobian (J Rm n) is numerically approximated using forward differences,Jij= yi pj= y(ti;p+ pj) y(ti;p)|| pj||,(17)or central differences (default)Jij= yi pj= y(ti;p+ pj) y(ti;p pj)2|| pj||,(18)where thej-th element of pjis the only non-zero element and is set to j(1 +|pj|). Forproblems with many parameters, a finite differences Jacobian is computationally the Jacobian is re-computed using finite differences only occasionally, convergence can beachieved with fewer function evaluations. So in all other iterations, the Jacobian is updatedusing the Broyden rank-1 update formula,J=J+(( y(p+h) y(p) Jh)hT)/(hTh).

9 (19)This rank-1 Jacobian update equation (19) requires no additional function Gavin September 18, Convergence criteriaConvergence is achieved whenoneof the following three criteria is satisfied, Convergence in the gradient: max JTW(y y) < 1; Convergence in parameters: max|hi/pi|< 2; or Convergence in 2: uses the value of thereduced 2, 2 = 2/(m n+ 1)< , iterations terminate when the iteration count exceeds a pre-specified AnalysisOnce the optimal curve-fit parameterspfitare determined, parameter statistics arecomputed for the converged solution. If the measurement error covariance matrixVy, orits diagonal, 2yi, is known a priori, (prior to the curve-fit), the weighting matrix should beset to the inverse of the measurement error covariance,W=V 1y, in estimating the modelparameters and in the following error analysis. Note that if the actual measurement errorsvary significantly across the measurement points ( , maxi( yi)/mini( yi)>10), any erroranalysis that presumes equal measurement errors will be reduced 2error criterion, 2 = 2m n+ 1=1m n+ 1(y y(pfit))TW(y y(pfit))(20)is a measure of the quality of the fit.

10 Large values, 2 1, indicate a poor fit, 2 1indicates that the fit error is of the same order as the measurement error (as desired), and 2 <1 indicates that the model is over-fitting the data; that is, the model is fitting themeasurement parameter covariance matrix is computed fromVp=[JTW J] 1.(21)The asymptotic standard parameter errors, p= diag([JTW J] 1),(22)give a measure of how unexplained variability in the data propagates to variability in theparameters, and is essentially an error measure for the asymptotic standard error of the fit, y= diag(J[JTW J] 1JT).(23) Gavin September 18, 20206 The Levenberg-Marquardt algorithm for nonlinear least squaresindicates how variability in the parameters affects the variability in the asymptotic standard prediction error, yp= diag(Vy+J[JTW J] 1JT).(24)reflects the standard error of the fit as well as the measurement the measurement error covariance, or individual measurement errors arenot knowninadvance of the analysis, the error analysis can be carried out assuming the same measurementerror for every measurement point, as estimated from the fit, 2y=1m n+ 1(y y(pfit))T(y y(pfit)).