Example: bankruptcy

1 The adjoint method - Stanford University Computer Science

PDE-constrained optimization and the adjoint method1 Andrew M. BradleyOctober 15, 2019 (original November 16, 2010)PDE-constrained optimization and the adjoint method for solving these and re-lated problems appear in a wide range of application domains. Often the adjointmethod is used in an application without explanation. The purpose of this tuto-rial is to explain the method in detail in a general setting that is kept as simpleas use the following notation: the total derivative (gradient) is denoted dx(usually denoted d( )/dxor x); the partial derivative, x(usually, ( )/ x); thedifferential, d. We also use the notationfxfor both partial and total derivativeswhen we think the meaning is clear from context. Recall that a gradient is a rowvector, and this convention induces sizing conventions for the other use only real numbers in this The adjoint methodLetx Rnxandp Rnp.

continuous; this method of lines induces a system of ODE. The method-of-lines treatment has two implications. First, the adjoint equation for the problem is also an ODE induced by the method of lines, and the derivation of the adjoint equation must re ect that. Second, the forward and adjoint ODE can be solved by standard adaptive ODE integrators.

Tags:

  Methods, Line, Of lien, Adjoint, Method of lines, Adjoint method

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of 1 The adjoint method - Stanford University Computer Science

1 PDE-constrained optimization and the adjoint method1 Andrew M. BradleyOctober 15, 2019 (original November 16, 2010)PDE-constrained optimization and the adjoint method for solving these and re-lated problems appear in a wide range of application domains. Often the adjointmethod is used in an application without explanation. The purpose of this tuto-rial is to explain the method in detail in a general setting that is kept as simpleas use the following notation: the total derivative (gradient) is denoted dx(usually denoted d( )/dxor x); the partial derivative, x(usually, ( )/ x); thedifferential, d. We also use the notationfxfor both partial and total derivativeswhen we think the meaning is clear from context. Recall that a gradient is a rowvector, and this convention induces sizing conventions for the other use only real numbers in this The adjoint methodLetx Rnxandp Rnp.

2 Suppose we have the functionf(x,p) :Rnx Rnp Rand the relationshipg(x,p) = 0 for a functiong:Rnx Rnp Rnxwhose partialderivativegxis everywhere nonsingular. What is dpf? MotivationThe equationg(x,p) = 0 is often solved by a complicated software program thatimplements what is sometimes called asimulationor theforward problem. Givenvalues for the parametersp, the program computes the valuesx. For example,pcould parameterize boundary and initial conditions and material properties for adiscretized PDE, andxare the resulting field (x,p) is often a measure ofmerit: for example, fit ofxto data at a set of locations, the smoothness ofxorp,the degree to whichpattains a particular objective. Minimizingfis sometimescalled theinverse we shall discuss seismic tomography. In this application,xare the fieldvalues in the wave (also referred to as the acoustic, second-order linear hyperbolic,second-order wave,etc.)

3 Equation,pparameterizes the earth model and initial andboundary conditions,fmeasures the difference between measured and syntheticwaveforms, andgencodes the wave equation, initial and boundary conditions,and generation of synthetic gradient dpfis useful in many contexts: for example, to solve the op-timization problem minpfor to assess the sensitivity offto the elements document is licensed under CC BY method to approximate dpfis to computenpfinite differences over theelements ofp. Each finite difference computation requires solvingg(x,p) = 0. Formoderate to largenp, this can be quite the program to solveg(x,p) = 0, it is likely that the Jacobian matrix xgiscalculated (see Sections and for further details). Theadjoint methodusesthe transpose of this matrix,gTx, to compute the gradient dpf.

4 The computationalcost is usually no greater than solvingg(x,p) = 0 once and sometimes even DerivationIn this section, we consider the slightly simpler functionf(x); see below for thefull ,dpf= dpf(x(p)) = xfdpx(=fxxp).(1)Second,g(x,p) = 0 everywhere impliesdpg= 0.(Note carefully that dpg= 0 everywhere only becauseg= 0 everywhere. It iscertainly not the case that a function that happens to be 0 at a point also has a0 gradient there.) Expanding the total derivative,gxxp+gp= everywhere nonsingular, the final equality impliesxp= g 1xgp. Substi-tuting this latter relationship into (1) yieldsdpf= fxg expression fxg 1xis a row vector times annx nxmatrix and may beunderstood in terms of linear algebra as the solution to the linear equationgTx = fTx,(2)whereTis the matrix transpose.

5 The matrix conjugate transpose (just the trans-pose when working with reals) is also called the matrixadjoint, and for this reason,the vector is called the vector ofadjoint variablesand the linear equation (2)is called theadjoint equation. In terms of , dpf= second derivation is useful. Define theLagrangianL(x,p, ) f(x) + Tg(x,p),1where in this context is the vector ofLagrange multipliers. Asg(x,p) is every-where zero by construction, we may choose freely,f(x) =L(x,p, ), anddpf(x) = dpL= xfdpx+ dp Tg+ T( xgdpx+ pg)=fxxp+ T(gxxp+gp) becauseg= 0 everywhere= (fx+ Tgx)xp+ we choose so thatgTx = fTx, then the first term is zero and we can avoidcalculatingxp. This condition is the adjoint equation (2). What remains, as inthe first derivation, is dpf= The relationship between the constraint and adjointequationsSupposeg(x,p) = 0 is the linear (inx) equationA(p)x b(p) = 0.

6 As xg=A(p),the adjoint equation isA(p)T = fTx. The two equations differ in form only bythe (x,p) = 0 is a nonlinear equation, then software that solves the systemforxgiven a particular value forpquite likely solves, at least approximately, asequence of linear equations of the form xg(x,p) x= g(x,p).(3) xg=gxis the Jacobian matrix for the functiong(x,p), and (3) is the linearsystem that gives the step to updatexin Newton s method . The adjoint equationgTx = fTxsolves a linear system that differs in form from (3) only by the a function of bothxandpSuppose our function isf(x,p) and we still haveg(x,p) = 0. How does thischange the calculations? Asdpf=fxxp+fp= Tgp+fp,the calculation changes only by the termfp, which usually is no harder to computein terms of computational complexity depends onp, for example, when the modeler wishes to weight or pe-nalize certain parameters.

7 For example, supposeforiginally measures the misfitbetween simulated and measured data; thenfdepends directly only onx. Butsuppose the model parameterspvary over space and the modeler prefers smoothdistributions ofp. Then a term can be added tofthat penalizes Partial derivativesWe have seen that xgis the Jacobian matrix for the nonlinear functiong(x,p) forfixedp. To obtain the gradient dpf, pgis also needed. This quantity generallyis no harder to calculate thangx. But it will almost certainly require writingadditional code, as the original software to solve justg(x,p) = 0 does not PDE-constrained optimization problemsPartial differential equations are used to model physical processes. Optimiza-tion over a PDE arises in at least two broad contexts: determining parametersof a PDE-based model so that the field values match observations (aninverseproblem); and design optimization: for example, of an airplane common, straightforward, and very successful approach to solving PDE-constrained optimization problems is to solve the numerical optimization problemresulting from discretizing the PDE.

8 Such problems take the formminimizepf(x,p)subject tog(x,p) = alternative is to discretize the first-order optimality conditions correspondingto the original problem; this approach has been explored in various contexts fortheoretical reasons but generally is much harder and is not as practically usefula broad approaches solve the numerical optimization problem. The firstapproach is that of modern, cutting-edge optimization packages: converge to afeasible solution (g(x,p) = 0) only asfconverges to a minimizer. The secondapproach is to require thatxbe feasible at every step inp(g(x,p) = 0).The first approach is almost certainly the better approach for almost all prob-lems. However, practical considerations turn out to make the second approachthe better one in many applications.

9 For example, a research effort may haveproduced a complicated program to solveg(x,p) = 0 (the PDE orforward prob-lem), and one now wants to solve an optimization problem (inverse problem) usingthis existing code. Additionally, other properties of particularly time-dependentproblems can make the first approach very difficult to the second approach, the problem solver must evaluatef(x,p), solveg(x,p) = 0, and provide the gradient dpf. Section 1 provides the necessarytools at a high level of generality to perform the final step. But at least one classof problems deserves some additional Time-dependent problemsTime-dependent problems have special structure for two reasons. First, the ma-trices of partial derivatives have very strong block structure; we shall not discussthis low-level topic here.

10 Second, and the subject of this section, time-dependentproblems are often treated bysemi-discretization: the spatial derivatives are madeexplicit in the various operators, but the time integration is treated as beingcontinuous; thismethod of linesinduces a system of ODE. The method -of-linestreatment has two implications. First, the adjoint equation for the problem isalso an ODE induced by the method of lines, and the derivation of the adjointequation must reflect that. Second, the forward and adjoint ODE can be solvedby standard adaptive ODE The adjoint methodConsider the problemminimizepF(x,p),whereF(x,p) T0f(x,p,t) dt,subject toh(x, x,p,t) = 0(4)g(x(0),p) = 0,wherepis a vector of unknown parameters;xis a (possibly vector-valued) functionof time;h(x, x,p,t) = 0 is an ODE in implicit form; andg(x(0),p) = 0 is the initialcondition, which is a function of some of the unknown parameters.


Related search queries