Example: stock market

Reinforcement Learning: An Introduction

Book Next: Contents Contents Reinforcement learning : An Introduction Richard S. Sutton and Andrew G. Barto A Bradford Book The MIT Press Cambridge, Massachusetts London, England In memory of A. Harry Klopf l Contents m Preface m Series Forward m Summary of Notation l I. The Problem m 1. Introduction n Reinforcement learning (1 di 4)22/06/2005 Examples n Elements of Reinforcement learning n An Extended Example: Tic-Tac-Toe n Summary n History of Reinforcement learning n Bibliographical Remarks m 2. Evaluative Feedback n An -Armed Bandit Problem n Action-Value Methods n Softmax Action Selection n Evaluation Versus Instruction n Incremental Implementation n Tracking a Nonstationary Problem n Optimistic Initial Values n Reinforcement Comparison n Pursuit Methods n Associative Search n Conclusions n Bibliographical and Historical Remarks m 3.

While reinforcement learning had clearly motivated some of the earliest computational studies of learning, most of these researchers had gone on to other things, such as pattern classification, supervised learning, and adaptive control, or they had abandoned the study of

Tags:

  Introduction, Learning, An introduction, Reinforcement, Reinforcement learning

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of Reinforcement Learning: An Introduction

1 Book Next: Contents Contents Reinforcement learning : An Introduction Richard S. Sutton and Andrew G. Barto A Bradford Book The MIT Press Cambridge, Massachusetts London, England In memory of A. Harry Klopf l Contents m Preface m Series Forward m Summary of Notation l I. The Problem m 1. Introduction n Reinforcement learning (1 di 4)22/06/2005 Examples n Elements of Reinforcement learning n An Extended Example: Tic-Tac-Toe n Summary n History of Reinforcement learning n Bibliographical Remarks m 2. Evaluative Feedback n An -Armed Bandit Problem n Action-Value Methods n Softmax Action Selection n Evaluation Versus Instruction n Incremental Implementation n Tracking a Nonstationary Problem n Optimistic Initial Values n Reinforcement Comparison n Pursuit Methods n Associative Search n Conclusions n Bibliographical and Historical Remarks m 3.

2 The Reinforcement learning Problem n The Agent-Environment Interface n Goals and Rewards n Returns n Unified Notation for Episodic and Continuing Tasks n The Markov Property n Markov Decision Processes n Value Functions n Optimal Value Functions n Optimality and Approximation n Summary n Bibliographical and Historical Remarks l II. Elementary Solution Methods m 4. Dynamic Programming n Policy Evaluation n Policy Improvement n Policy Iteration n Value Iteration n Asynchronous Dynamic Programming n Generalized Policy Iteration n Efficiency of Dynamic Programming (2 di 4)22/06/2005 Summary n Bibliographical and Historical Remarks m 5.

3 Monte Carlo Methods n Monte Carlo Policy Evaluation n Monte Carlo Estimation of Action Values n Monte Carlo Control n On-Policy Monte Carlo Control n Evaluating One Policy While Following Another n Off-Policy Monte Carlo Control n Incremental Implementation n Summary n Bibliographical and Historical Remarks m 6. Temporal-Difference learning n TD Prediction n Advantages of TD Prediction Methods n Optimality of TD(0) n Sarsa: On-Policy TD Control n Q- learning : Off-Policy TD Control n Actor-Critic Methods n R- learning for Undiscounted Continuing Tasks n Games, Afterstates, and Other Special Cases n Summary n Bibliographical and Historical Remarks l III.

4 A Unified View m 7. Eligibility Traces n -Step TD Prediction n The Forward View of TD() n The Backward View of TD() n Equivalence of Forward and Backward Views n Sarsa() n Q() n Eligibility Traces for Actor-Critic Methods n Replacing Traces n Implementation Issues n Variable n Conclusions n Bibliographical and Historical Remarks m 8. Generalization and Function Approximation n Value Prediction with Function Approximation n Gradient-Descent Methods (3 di 4)22/06/2005 Linear Methods n Coarse Coding n Tile Coding n Radial Basis Functions n Kanerva Coding n Control with Function Approximation n Off-Policy Bootstrapping n Should We Bootstrap?

5 N Summary n Bibliographical and Historical Remarks m 9. Planning and learning n Models and Planning n Integrating Planning, Acting, and learning n When the Model Is Wrong n Prioritized Sweeping n Full vs. Sample Backups n Trajectory Sampling n Heuristic Search n Summary n Bibliographical and Historical Remarks m 10. Dimensions of Reinforcement learning n The Unified View n Other Frontier Dimensions m 11. Case Studies n TD-Gammon n Samuel's Checkers Player n The Acrobot n Elevator Dispatching n Dynamic Channel Allocation n Job-Shop Scheduling l Bibliography m Index Mark Lee 2005-01-04 (4 di 4)22/06/2005 Next: Preface Up: Book Previous: Book Contents l I.

6 The Problem m 1. Introduction m 2. Evaluative Feedback m 3. The Reinforcement learning Problem l II. Elementary Solution Methods m 4. Dynamic Programming m 5. Monte Carlo Methods m 6. Temporal-Difference learning l III. A Unified View m 7. Eligibility Traces m 8. Generalization and Function Approximation m 9. Planning and learning m 10. Dimensions of Reinforcement learning m 11. Case Studies l Bibliography Subsections m Preface m Series Forward m Summary of Notation Mark Lee 2005-01-04 Next: Series Forward Up: Contents Previous: Contents Contents Preface We first came to focus on what is now known as Reinforcement learning in late 1979. We were both at the University of Massachusetts, working on one of the earliest projects to revive the idea that networks of neuronlike adaptive elements might prove to be a promising approach to artificial adaptive intelligence.

7 The project explored the "heterostatic theory of adaptive systems" developed by A. Harry Klopf. Harry's work was a rich source of ideas, and we were permitted to explore them critically and compare them with the long history of prior work in adaptive systems. Our task became one of teasing the ideas apart and understanding their relationships and relative importance. This continues today, but in 1979 we came to realize that perhaps the simplest of the ideas, which had long been taken for granted, had received surprisingly little attention from a computational perspective. This was simply the idea of a learning system that wants something, that adapts its behavior in order to maximize a special signal from its environment. This was the idea of a "hedonistic" learning system, or, as we would say now, the idea of Reinforcement learning .

8 Like others, we had a sense that Reinforcement learning had been thoroughly explored in the early days of cybernetics and artificial intelligence. On closer inspection, though, we found that it had been explored only slightly. While Reinforcement learning had clearly motivated some of the earliest computational studies of learning , most of these researchers had gone on to other things, such as pattern classification, supervised learning , and adaptive control, or they had abandoned the study of learning altogether. As a result, the special issues involved in learning how to get something from the environment received relatively little attention. In retrospect, focusing on this idea was the critical step that set this branch of research in motion.

9 Little progress could be made in the computational study of Reinforcement learning until it was recognized that such a fundamental idea had not yet been thoroughly explored. The field has come a long way since then, evolving and maturing in several directions. Reinforcement learning has gradually become one of the most active research areas in machine learning , artificial intelligence, and neural network research. The field has developed strong mathematical foundations and impressive applications. The computational study of Reinforcement learning is now a large field, with hundreds of active researchers around the world in diverse disciplines such as psychology, control theory, artificial intelligence, and neuroscience. Particularly important have been the contributions establishing and developing the relationships to the theory of optimal control and dynamic programming.

10 The overall problem of learning from interaction to achieve goals is still far from being solved, but our understanding of it has improved significantly. We can now place component ideas, such as temporal-difference learning , dynamic programming, and function approximation, within a coherent perspective with respect to the overall problem. Our goal in writing this book was to provide a clear and simple account of the key ideas and algorithms of Reinforcement learning . We wanted our treatment to be accessible to readers in all of the related disciplines, but we could not cover all of these perspectives in detail. Our treatment takes (1 di 3)22/06/2005 exclusively the point of view of artificial intelligence and engineering, leaving coverage of connections to psychology, neuroscience, and other fields to others or to another time.


Related search queries