Lecture 1: Introduction to Reinforcement Learning

Lecture 1: Introduction to Reinforcement LearningLecture 1: Introduction to ReinforcementLearningDavid SilverLecture 1: Introduction to Reinforcement LearningOutline1 Admin2 About Reinforcement Learning3 The Reinforcement Learning Problem4 Inside An RL Agent5 Problems within Reinforcement LearningLecture 1: Introduction to Reinforcement LearningAdminClass InformationThursdays 9:30 to 11:00amWebsite: : me: 1: Introduction to Reinforcement LearningAdminAssessmentAssessment will be 50% coursework, 50% examCourseworkAssignment A: RL problemAssignment B: Kernels problemAssessment =max(assignment1,assignment2)Examination A: 3 RL questionsB: 3 kernels questionsAnswer any 3 questionsLecture 1: Introduction to Reinforcement LearningAdminTextbooksAn Introduction to Reinforcement Learning , Sutton andBarto, 1998 MIT Press, 1998 40 poundsAvailable free online!

Sutton/ for Reinforcement Learning , SzepesvariMorgan and Claypool, 2010 20 poundsAvailable free online! szepesva/ 1: Introduction to Reinforcement LearningAbout RLMany Faces of Reinforcement LearningComputer ScienceEconomicsMathematicsEngineeringNe urosciencePsychologyMachine LearningClassical/OperantConditioningOpt imal ControlRewardSystemOperations ResearchBoundedRationalityReinforcement LearningLecture 1: Introduction to Reinforcement LearningAbout RLBranches of Machine LearningReinforcement LearningSupervised LearningUnsupervised LearningMachineLearningLecture 1: Introduction to Reinforcement LearningAbout RLCharacteristics of Reinforcement LearningWhat makes Reinforcement Learning different from other machinelearning paradigms?

There is no supervisor, only arewardsignalFeedback is delayed, not instantaneousTime really matters (sequential, non data)Agent s actions affect the subsequent data it receivesLecture 1: Introduction to Reinforcement LearningAbout RLExamples of Reinforcement LearningFly stunt manoeuvres in a helicopterDefeat the world champion at BackgammonManage an investment portfolioControl a power stationMake a humanoid robot walkPlay many different Atari games better than humansLecture 1: Introduction to Reinforcement LearningAbout RLHelicopter ManoeuvresLecture 1: Introduction to Reinforcement LearningAbout RLBipedal RobotsLecture 1: Introduction to Reinforcement LearningAbout RLAtariLecture 1: Introduction to Reinforcement LearningThe RL ProblemRewardRewardsA rewardRtis a scalar feedback signalIndicates how well agent is doing at steptThe agent s job is to maximise cumulative rewardReinforcement Learning is based on the reward hypothesisDefinition (Reward Hypothesis)Allgoals can be described by the maximisation of expectedcumulative rewardDo you agree with this statement?

Lecture 1: Introduction to Reinforcement LearningThe RL ProblemRewardExamples of RewardsFly stunt manoeuvres in a helicopter+ve reward for following desired trajectory ve reward for crashingDefeat the world champion at Backgammon+/ ve reward for winning/losing a gameManage an investment portfolio+ve reward for each $ in bankControl a power station+ve reward for producing power ve reward for exceeding safety thresholdsMake a humanoid robot walk+ve reward for forward motion ve reward for falling overPlay many different Atari games better than humans+/ ve reward for increasing/decreasing scoreLecture 1: Introduction to Reinforcement LearningThe RL ProblemRewardSequential Decision MakingGoal:select actions to maximise total future rewardActions may have long term consequencesReward may be delayedIt may be better to sacrifice immediate reward to gain morelong-term rewardExamples:A financial investment (may take months to mature)Refuelling a helicopter (might prevent a crash in several hours)Blocking opponent moves (might help winning chances manymoves from now) Lecture 1.

Introduction to Reinforcement LearningThe RL ProblemEnvironmentsAgent and EnvironmentobservationrewardactionAtRtOt Lecture 1: Introduction to Reinforcement LearningThe RL ProblemEnvironmentsAgent and EnvironmentobservationrewardactionAtRtOt At each steptthe agent:Executes actionAtReceives observationOtReceives scalar rewardRtThe environment:Receives actionAtEmits observationOt+1 Emits scalar rewardRt+1tincrements at env. stepLecture 1: Introduction to Reinforcement LearningThe RL ProblemStateHistory and StateThe history is the sequence of observations, actions, rewardsHt=O1,R1,A1,..,At 1,Ot, all observable variables up to the sensorimotor stream of a robot or embodied agentWhat happens next depends on the history:The agent selects actionsThe environment selects observations/rewardsState is the information used to determine what happens nextFormally, state is a function of the history:St=f(Ht) Lecture 1.

Introduction to Reinforcement LearningThe RL ProblemStateEnvironment StateobservationrewardactionAtRtOtSteenv ironment stateThe environment stateSetisthe environment s whatever data theenvironment uses to pick thenext observation/rewardThe environment state is notusually visible to the agentEven ifSetis visible, it maycontain irrelevantinformationLecture 1: Introduction to Reinforcement LearningThe RL ProblemStateAgent StateobservationrewardactionAtRtOtStaage nt stateThe agent stateSatis theagent s whatever informationthe agent uses to pick thenext it is the informationused by reinforcementlearning algorithmsIt can be any function ofhistory:Sat=f(Ht) Lecture 1: Introduction to Reinforcement LearningThe RL ProblemStateInformation StateAn information state ( Markov state) contains all usefulinformation from the stateStis Markov if and only ifP[St+1|St] =P[St+1|S1.]

,St] The future is independent of the past given the present H1:t St Ht+1: Once the state is known, the history may be thrown The state is a sufficient statistic of the futureThe environment stateSetis MarkovThe historyHtis MarkovLecture 1: Introduction to Reinforcement LearningThe RL ProblemStateRat ExampleWhat if agent state = last 3 items in sequence?What if agent state = counts for lights, bells and levers?What if agent state = complete sequence? Lecture 1: Introduction to Reinforcement LearningThe RL ProblemStateFully Observable EnvironmentsstaterewardactionAtRtStFull observability: agent directlyobserves environment stateOt=Sat=SetAgent state = environmentstate = information stateFormally, this is a Markovdecision process (MDP)(Next Lecture and themajority of this course) Lecture 1: Introduction to Reinforcement LearningThe RL ProblemStatePartially Observable EnvironmentsPartial observability: agent indirectly observes environment.

A robot with camera vision isn t told its absolute locationA trading agent only observes current pricesA poker playing agent only observes public cardsNow agent state6= environment stateFormally this is a partially observable Markov decision process(POMDP)Agent must construct its own state representationSat, history:Sat=HtBeliefs of environment state:Sat= (P[Set=s1],..,P[Set=sn])Recurrent neural network:Sat= (Sat 1Ws+OtWo) Lecture 1: Introduction to Reinforcement LearningInside An RL AgentMajor Components of an RL AgentAn RL agent may include one or more of these components:Policy: agent s behaviour functionValue function: how good is each state and/or actionModel: agent s representation of the environmentLecture 1: Introduction to Reinforcement LearningInside An RL AgentPolicyA policy is the agent s behaviourIt is a map from state to action, policy:a= (s)Stochastic policy: (a|s) =P[At=a|St=s] Lecture 1.

Introduction to Reinforcement LearningInside An RL AgentValue FunctionValue function is a prediction of future rewardUsed to evaluate the goodness/badness of statesAnd therefore to select between actions, (s) =E [Rt+1+ Rt+2+ 2Rt+3+..|St=s] Lecture 1: Introduction to Reinforcement LearningInside An RL AgentExample: Value Function in AtariLecture 1: Introduction to Reinforcement LearningInside An RL AgentModelA model predicts what the environment will do nextPpredicts the next stateRpredicts the next (immediate) reward, =P[St+1=s |St=s,At=a]Ras=E[Rt+1|St=s,At=a] Lecture 1: Introduction to Reinforcement LearningInside An RL AgentMaze ExampleStartGoalRewards: -1 per time-stepActions: N, E, S, WStates: Agent s locationLecture 1: Introduction to Reinforcement LearningInside An RL AgentMaze Example.

PolicyStartGoalArrows represent policy (s) for each statesLecture 1: Introduction to Reinforcement LearningInside An RL AgentMaze Example: Value Function-14-13-12-11-10-9-16-15-12-8-16- 17-6-7-18-19-5-24-20-4-3-23-22-21-22-2-1 StartGoalNumbers represent valuev (s) of each statesLecture 1: Introduction to Reinforcement LearningInside An RL AgentMaze Example: Model-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1- 1 StartGoalAgent may have an internalmodel of the environmentDynamics: how actionschange the stateRewards: how much rewardfrom each stateThe model may be imperfectGrid layout represents transition modelPass Numbers represent immediate rewardRasfro

Lecture 1: Introduction to Reinforcement Learning

Tags:

Information

Transcription of Lecture 1: Introduction to Reinforcement Learning

Related search queries

Lecture 1: Introduction to Reinforcement Learning

Tags:

Information

Documents from same domain

Lecture 2: Markov Decision Processes

Related documents

Introduction to Macroeconomics Lecture Notes

Lecture 1: Introduction to Public Economics

UNIT 1 Introduction to Transport Economics

INTRODUCTION TO MICROECONOMICS

Unit 1: An Introduction To Environmental Economics and ...

Microeconomic Theory - Texas A&M University

Related search queries