Reinforcement Learning: An Introduction

IReinforcement Learning: An IntroductionSecond edition, in progressRichard S. Sutton and Andrew G. Bartoc 2012A Bradford BookThe MIT PressCambridge, MassachusettsLondon, EnglandiiIn memory of A. Harry KlopfContentsPreface ..viiiSeries Forward ..xiiSummary of Notation ..xiiiI The Problem11 Reinforcement learning .. Examples .. Elements of Reinforcement learning .. An Extended Example: Tic-Tac-Toe .. Summary .. History of Reinforcement learning .. Bibliographical Remarks ..232 Bandit Ann-Armed Bandit Problem .. Action-Value Methods .. Softmax Action Selection .. Incremental Implementation.

Tracking a Nonstationary Problem .. Optimistic Initial Values .. Associative Search (Contextual Bandits) .. Conclusions .. Bibliographical and Historical Remarks ..403 The Reinforcement learning The Agent Environment Interface .. Goals and Rewards .. Returns .. Unified Notation for Episodic and Continuing Tasks ..52 The Markov Property .. Markov Decision Processes .. Value Functions .. Optimal Value Functions .. Optimality and Approximation .. Summary .. Bibliographical and Historical Remarks ..74II Tabular Action-Value Methods794 Dynamic Policy Evaluation.

Policy Improvement .. Policy Iteration .. Value Iteration .. Asynchronous Dynamic Programming .. Generalized Policy Iteration .. Efficiency of Dynamic Programming .. Summary .. Bibliographical and Historical Remarks .. 1035 Monte Carlo Monte Carlo Policy Evaluation .. Monte Carlo Estimation of Action Values .. Monte Carlo Control .. On-Policy Monte Carlo Control .. 118 Evaluating One Policy While Following Another (Off-policy Pol-icy Evaluation) .. Off-Policy Monte Carlo Control .. Incremental Implementation .. Summary.

Bibliographical and Historical Remarks .. 1276 Temporal-Difference TD Prediction .. Advantages of TD Prediction Methods .. Optimality of TD(0) .. Sarsa: On-Policy TD Control .. Q- learning : Off-Policy TD Control .. Games, Afterstates, and Other Special Cases .. Summary .. Bibliographical and Historical Remarks .. 1497 Eligibility TD Prediction .. The Forward View of TD( ) .. The Backward View of TD( ) .. Equivalence of Forward and Backward Views .. Sarsa( ) .. Q( ) .. Replacing Traces.

Implementation Issues .. 178 Variable .. Conclusions .. Bibliographical and Historical Remarks .. 1808 Planning and learning with Tabular Models and Planning .. Integrating Planning, Acting, and learning .. When the Model Is Wrong .. Prioritized Sweeping .. Full vs. Sample Backups .. Trajectory Sampling .. Heuristic Search .. Summary .. Bibliographical and Historical Remarks .. 209 III Approximate Solution Methods2119 On-policy Approximation of Action Value Prediction with Function Approximation .. Gradient-Descent Methods.

Linear Methods .. Control with Function Approximation .. Should We Bootstrap? .. Summary .. Bibliographical and Historical Remarks .. 23910 Off-policy Approximation of Action Values24311 Policy Actor Critic Methods .. R- learning and the Average-Reward Setting .. 24812 State Estimation251 CONTENTSvii13 Temporal Abstraction253IV Frontiers25514 Biological Reinforcement Learning25715 Applications and Case TD-Gammon .. Samuel s Checkers Player .. The Acrobot .. Elevator Dispatching .. Dynamic Channel Allocation .. Job-Shop Scheduling.

28116 The Unified View .. Other Frontier Dimensions .. 292 References295 Index320viiiPREFACEP refaceWe first came to focus on what is now known as Reinforcement learning in late1979. We were both at the University of Massachusetts, working on one ofthe earliest projects to revive the idea that networks of neuronlike adaptiveelements might prove to be a promising approach to artificial adaptive intel-ligence. The project explored the heterostatic theory of adaptive systems developed by A. Harry Klopf. Harry s work was a rich source of ideas, andwe were permitted to explore them critically and compare them with the longhistory of prior work in adaptive systems.

Our task became one of teasingthe ideas apart and understanding their relationships and relative continues today, but in 1979 we came to realize that perhaps the simplestof the ideas, which had long been taken for granted, had received surprisinglylittle attention from a computational perspective. This was simply the idea ofa learning system thatwantssomething, that adapts its behavior in order tomaximize a special signal from its environment. This was the idea of a he-donistic learning system, or, as we would say now, the idea of others, we had a sense that Reinforcement learning had been thor-oughly explored in the early days of cybernetics and artificial intelligence.

Oncloser inspection, though, we found that it had been explored only Reinforcement learning had clearly motivated some of the earliest com-putational studies of learning , most of these researchers had gone on to otherthings, such as pattern classification, supervised learning , and adaptive con-trol, or they had abandoned the study of learning altogether. As a result, thespecial issues involved in learning how to get something from the environmentreceived relatively little attention. In retrospect, focusing on this idea wasthe critical step that set this branch of research in motion. Little progresscould be made in the computational study of Reinforcement learning until itwas recognized that such a fundamental idea had not yet been field has come a long way since then, evolving and maturing in sev-eral directions.

Reinforcement learning has gradually become one of the mostactive research areas in machine learning , artificial intelligence, and neural net-work research. The field has developed strong mathematical foundations andimpressive applications. The computational study of Reinforcement learning isnow a large field, with hundreds of active researchers around the world in di-verse disciplines such as psychology, control theory, artificial intelligence, andneuroscience. Particularly important have been the contributions establishingand developing the relationships to the theory of optimal control and dynamicprogramming. The overall problem of learning from interaction to achievePREFACE ixgoals is still far from being solved, but our understanding of it has improvedsignificantly.

Reinforcement Learning: An Introduction

Tags:

Information

Advertisement

Transcription of Reinforcement Learning: An Introduction

Related search queries

Reinforcement Learning: An Introduction

Tags:

Information

Advertisement

Documents from same domain

Related documents

Related search queries