LECTURE SLIDES ON DYNAMIC PROGRAMMING BASED …

LECTURE SLIDES ON DYNAMIC PROGRAMMINGBASED ON LECTURES GIVEN AT THEMASSACHUSETTS INSTITUTE OF TECHNOLOGYCAMBRIDGE, MASSFALL 2004 DIMITRI P. BERTSEKAST hese LECTURE SLIDES are BASED on the book: DYNAMIC PROGRAMMING and Optimal Con-trol: 2nd edition, Vols. I and II, AthenaScientific, 2002, by Dimitri P. Bertsekas; Updated: December 2004 The SLIDES are copyrighted, but may be freelyreproduced and distributed for any noncom-mercial CONTENTS These SLIDES consist of 24 Lectures, whose sum-mary is given in the next 24 SLIDES Lectures 1-2: Basic DYNAMIC PROGRAMMING al-gorithm (Chapter 1) Lectures 3-4: Deterministic discrete-time andshortest path problems (Chapter 2) Lectures 5-6: Stochastic discrete-time problems(Chapter 4) Lectures 7-9: Deterministic continuous-time op-timal control (Chapter 4) Lectures 10-12: Problems of imperfect stateinformation (Chapter 5) Lectures 13-16: Approximate DP - suboptimalcontrol (Chapter 6) Lectures 17-20: Introduction to infinite horizonproblems (Chapter 7) Lectures 21-24.

Advanced topics on infinitehorizon and approximate DP (Volume II) DYNAMIC PROGRAMMINGLECTURE 1 LECTURE OUTLINE Problem Formulation Examples The Basic Problem Significance of DYNAMIC PROGRAMMINGLECTURE 2 LECTURE OUTLINE The basic problem Principle of optimality DP example: Deterministic problem DP example: Stochastic problem The general DP algorithm State DYNAMIC PROGRAMMINGLECTURE 3 LECTURE OUTLINE Deterministic finite-state DP problems Backward shortest path algorithm Forward shortest path algorithm Shortest path examples Alternative shortest path DYNAMIC PROGRAMMINGLECTURE 4 LECTURE OUTLINE Label correcting methods for shortest paths Variants of label correcting methods Branch-and-bound as a shortest path DYNAMIC PROGRAMMINGLECTURE 5 LECTURE OUTLINE Examples of stochastic DP problems Linear-quadratic problems Inventory DYNAMIC PROGRAMMINGLECTURE 6 LECTURE OUTLINE Stopping problems Scheduling problems Other DYNAMIC PROGRAMMINGLECTURE 7 LECTURE OUTLINE Deterministic continuous-time optimal control Examples Connection with the calculus of variations The Hamilton-Jacobi-Bellman equation as acontinuous-time limit of the DP algorithm The

Hamilton-Jacobi-Bellman equation as asufficient condition DYNAMIC PROGRAMMINGLECTURE 8 LECTURE OUTLINE Deterministic continuous-time optimal control From the HJB equation to the Pontryagin Min-imum Principle DYNAMIC PROGRAMMINGLECTURE 9 LECTURE OUTLINE Deterministic continuous-time optimal control Variants of the Pontryagin Minimum Principle Fixed terminal state Free terminal time Examples Discrete-Time Minimum DYNAMIC PROGRAMMINGLECTURE 10 LECTURE OUTLINE Problems with imperfect state info Reduction to the perfect state info case Machine repair DYNAMIC PROGRAMMINGLECTURE 11 LECTURE OUTLINE Review of DP for imperfect state info Linear quadratic problems Separation of estimation and DYNAMIC PROGRAMMINGLECTURE 12 LECTURE OUTLINE DP for imperfect state info Sufficient statistics Conditional state distribution as a sufficientstatistic Finite-state systems DYNAMIC PROGRAMMINGLECTURE 13 LECTURE OUTLINE Suboptimal control Certainty equivalent control Implementations and approximations Issues in adaptive DYNAMIC PROGRAMMINGLECTURE 14 LECTURE OUTLINE Limited lookahead policies Performance bounds Computational aspects Problem approximation approach Vehicle routing example Heuristic cost-to-go approximation Computer DYNAMIC PROGRAMMINGLECTURE 15 LECTURE OUTLINE Rollout algorithms Cost improvement property Discrete deterministic problems Sequential consistency and greedy algorithms Sequential DYNAMIC PROGRAMMINGLECTURE 16 LECTURE OUTLINE More on rollout algorithms Simulation- BASED methods Approximations of rollout algorithms Rolling horizon approximations Discretization issues Other suboptimal DYNAMIC PROGRAMMINGLECTURE 17 LECTURE OUTLINE Infinite horizon problems Stochastic

Shortest path problems Bellman s equation DYNAMIC PROGRAMMING value iteration DYNAMIC PROGRAMMINGLECTURE 18 LECTURE OUTLINE Stochastic shortest path problems Policy iteration Linear PROGRAMMING Discounted DYNAMIC PROGRAMMINGLECTURE 19 LECTURE OUTLINE Average cost per stage problems Connection with stochastic shortest path prob-lems Bellman s equation Value iteration Policy DYNAMIC PROGRAMMINGLECTURE 20 LECTURE OUTLINE Control of continuous-time Markov chains Semi-Markov problems Problem formulation Equivalence to discrete-time problems Discounted problems Average cost DYNAMIC PROGRAMMINGLECTURE 21 LECTURE OUTLINE With this LECTURE , we start a four- LECTURE se-quence on advanced DYNAMIC PROGRAMMING andneuro- DYNAMIC PROGRAMMING topics. References: DYNAMIC PROGRAMMING and Optimal Con-trol, Vol. II, by D. Bertsekas Neuro- DYNAMIC PROGRAMMING , by D.

Bert-sekas and J. Tsitsiklis 1st LECTURE :Discounted problems with infinitestate space, stochastic shortest path problem 2nd LECTURE :DP with cost function approxi-mation 3rd LECTURE :Simulation- BASED policy and valueiteration, temporal difference methods 4th LECTURE :Other approximation methods:Q-learning, state aggregation, approximate linearprogramming, approximation in policy DYNAMIC PROGRAMMINGLECTURE 22 LECTURE OUTLINE Approximate DP for large/intractable problems Approximate policy iteration Simulation- BASED policy iteration Actor-critic interpretation Learning how to play tetris: A case study Approximate value iteration with function DYNAMIC PROGRAMMINGLECTURE 23 LECTURE OUTLINE Simulation- BASED policy and value iteration meth-ods -Least Squares Policy Evaluation method Temporal differences implementation Policy evaluation by approximate value itera-tion TD( ) DYNAMIC PROGRAMMINGLECTURE 24 LECTURE OUTLINE Additional methods for approximate DP Q-Learning Aggregation Linear PROGRAMMING with function approxima-tion Gradient- BASED approximation in policy DYNAMIC PROGRAMMINGLECTURE 1 LECTURE OUTLINE Problem Formulation Examples The Basic Problem Significance of FeedbackDP AS AN OPTIMIZATION METHODOLOGY Basic optimization problemminu Ug(u)whereuis the optimization/decision variable,g(u)is the cost function, andUis the constraint set Categories of problems.

Discrete (Uis finite) or continuous Linear (gis linear andUis polyhedral) ornonlinear Stochastic or deterministic: In stochastic prob-lems the cost involves a stochastic parameterw, which is averaged, , it has the formg(u)=Ew G(u, w) wherewis a random parameter. DP can deal with complex stochastic problemswhere information aboutwbecomes available instages, and the decisions are also made in stagesand make use of this STRUCTURE OF STOCHASTIC DP Discrete-time systemxk+1=fk(xk,uk,wk),k=0,1,..,N 1 k:Discrete time xk:State;summarizes past information thatis relevant for future optimization uk:Control;decision to be selected at timekfrom a given set wk:Random parameter(also called distur-bance or noise depending on the context) N:Horizonor number of times control isapplied Cost function that is additive over timeE gN(xN)+N 1 k=0gk(xk,uk,wk) INVENTORY CONTROL EXAMPLEI nventorySystemStock Ordered atPeriod kStock at Period kStock at Period k + 1 Demand at Period kxkwkxk + 1 = xk + uk - wkukCost ofPeriod kcuk + r (xk + uk - wk) Discrete-time systemxk+1=fk(xk,uk,wk)=xk+uk wk Cost function that is additive over timeE gN(xN)+N 1 k=0gk(xk,uk,wk) =E N 1 k=0 cuk+r(xk+uk wk) Optimization over policies: Rules/functionsuk= k(xk) that map states to controlsADDITIONAL ASSUMPTIONS The set of values that the controlukcan takedepend at most onxkand not on priorxoru Probability distribution ofwkdoes not dependon past valueswk 1.

,w0, but may depend onxkanduk Otherwise past values ofworxwould beuseful for future optimization Sequence of events envisioned in periodk: xkoccurs according toxk=fk 1 xk 1,uk 1,wk 1 ukis selected with knowledge ofxk, ,uk U(xk) wkis random and generated according to adistributionPwk(xk,uk)DETERMINISTIC FINITE-STATE PROBLEMS Scheduling example: Find optimal sequence ofoperations A, B, C, D A must precede B, and C must precede D Given startup costSAandSC, and setup tran-sition costCmnfrom operationmto operationnASACSCABCABACCACCDACADABCCACCD CDACDACBCABCADCBCCCBCCDCABCCACDACCDCBDCD BCBDCDBCABI nitialStateSTOCHASTIC FINITE-STATE PROBLEMS Example: Find two-game chess match strategy Timidplay draws with >0andloseswith prob. 1 wins with <1/2 and loses with prob. 1 pw1 - - 12 - - - 22nd Game / Timid Play2nd Game / Bold Play1st Game / Timid Play0 - - 1pd1 - pd1st Game / Bold Play0 - 01 - 00 - 11 - pwpw1 - - 12 - - - 2pdpdpd1 - pd1 - pd1 - pd1 - pwpw1 - pwpw1 - pwpwBASIC PROBLEM Systemxk+1=fk(xk,uk,wk),k=0.

,N 1 Control contraintsuk U(xk) Probability distributionPk( |xk,uk)ofwk Policies ={ 0,.., N 1},where kmapsstatesxkinto controlsuk= k(xk)andissuchthat k(xk) Uk(xk) for allxk Expected costof starting atx0isJ (x0)=E gN(xN)+N 1 k=0gk(xk, k(xk),wk) Optimal cost functionJ (x0)=min J (x0) Optimal policy satisfiesJ (x0)=J (x0)When produced by DP, is independent OF FEEDBACK Open-loop versus closed-loop policies Systemxk + 1 = fk(xk,uk,wk)mkuk = mk(xk)xkwk In deterministic problems open loop is as goodas closed loop Chess match example; value of informationTimid Play1 - pdpdBold Play0 - 01 - 00 - 11 - - 11 - 10 - 21 - pwpwBold PlayA NOTE ON THESE SLIDES These SLIDES are a teaching aid, not a text Don t expect a rigorous mathematical develop-ment or precise mathematical statements Figures are meant to convey and enhance ideas,not to express them precisely Omitted proofs and a much fuller discussioncan be found in the text, which these SLIDES DYNAMIC PROGRAMMINGLECTURE 2 LECTURE OUTLINE The basic problem Principle of optimality DP example: Deterministic problem DP example: Stochastic problem The general DP algorithm State augmentationBASIC PROBLEM Systemxk+1=fk(xk,uk,wk),k=0.

,N 1 Control constraintsuk U(xk) Probability distributionPk( |xk,uk)ofwk Policies ={ 0,.., N 1},where kmapsstatesxkinto controlsuk= k(xk)andissuchthat k(xk) Uk(xk) for allxk Expected costof starting atx0isJ (x0)=E gN(xN)+N 1 k=0gk(xk, k(xk),wk) Optimal cost functionJ (x0)=min J (x0) Optimal policy is one that satisfiesJ (x0)=J (x0)PRINCIPLE OF OPTIMALITY Let ={ 0, 1,.., N 1}be an optimal pol-icy Consider the tail subproblem whereby we areatxiat timeiand wish to minimize the cost-to-go from timeito timeNE gN(xN)+N 1 k=igk xk, k(xk),wk and the tail policy { i, i+1,.., N 1}0 NixiTail Subproblem Principle of optimality:The tail policy is opti-mal for the tail subproblem DP first solves ALL tail subroblems of finalstage At the generic step, it solves ALL tail subprob-lems of a given time length, using the solution ofthe tail subproblems of shorter time lengthDETERMINISTIC SCHEDULING EXAMPLE Find optimal sequence of operations A, B, C,D (A must precede B and C must precede D)ACABACCDAABCCACDACDACBCABCADI nitialState10762866229333333515443154 Start from the last tail subproblem and go back-wards At each state-time pair, we record the optimalcost-to-go and the optimal decisionSTOCHASTIC INVENTORY EXAMPLEI nventorySystemStock Ordered atPeriod kStock at Period kStock at Period k + 1 Demand at Period kxkwkxk + 1 = xk + uk - wkukCost ofPeriod kcuk + r (xk + uk - wk) Tail Subproblems of Length 1.

JN 1(xN 1)= minuN 1 0 EwN 1 cuN 1+r(xN 1+uN 1 wN 1) Tail Subproblems of LengthN k:Jk(xk)= minuk 0 Ewk cuk+r(xk+uk wk)+Jk+1(xk+uk wk) DP ALGORITHM Start withJN(xN)=gN(xN),and go backwards usingJk(xk)= minuk Uk(xk)Ewk gk(xk,uk,wk)+Jk+1 fk(xk,uk,wk) ,k=0,1,..,N 1. ThenJ0(x0), generated at the last step, is equalto the optimal costJ (x0). Also, the policy ={ 0,.., N 1}where k(xk) minimizes in the right side above foreachxkandk,isoptimal. Justification:Proof by induction thatJk(xk)is equal toJ k(xk), defined as the optimal cost ofthe tail subproblem that starts at timekat statexk. Note that ALL the tail subproblems are solvedin addition to the original problem, and the inten-sive computational OF THE INDUCTION STEP Let k= k, k+1,.., N 1 denote a tailpolicy from timekonward Assume thatJk+1(xk+1)=J k+1(xk+1). ThenJ k(xk)= min( k, k+1)Ewk,..,wN 1 gk xk, k(xk),wk +gN(xN)+N 1 i=k+1gi xi, i(xi),wi =min kEwk gk xk, k(xk),wk +min k+1 Ewk+1.

LECTURE SLIDES ON DYNAMIC PROGRAMMING BASED …

Tags:

Information

Advertisement

Transcription of LECTURE SLIDES ON DYNAMIC PROGRAMMING BASED …

Related search queries

LECTURE SLIDES ON DYNAMIC PROGRAMMING BASED …

Tags:

Information

Advertisement

Documents from same domain

Related documents

Related search queries