Reinforcement Learning: An Introduction
Authors : Richard S. Sutton, Department of Computing Science, University of Alberta and Andrew G. Barto, Department of Computer Science, University of Massachusetts Amherst
ISBN : 0262193981
Pages : 432
Publisher : The MIT Press
Publication Date : 1998
The book consists of three parts. Part I is introductory and problem oriented. We focus on the simplest aspects of reinforcement learning and on its main distinguishing features. One full chapter is devoted to introducing the reinforcement learning problem whose solution we explore in the rest of the book. Part II presents what we see as the three most important elementary solution methods: dynamic programming, simple Monte Carlo methods, and temporal-difference learning. The first of these is a planning method and assumes explicit knowledge of all aspects of a problem, whereas the other two are learning methods. Part III is concerned with generalizing these methods and blending them. Eligibility traces allow unification of Monte Carlo and temporal-difference methods, and function approximation methods such as artificial neural networks extend all the methods so that they can be applied to much larger problems. We bring planning and learning methods together again and relate them to heuristic search. Finally, we summarize our view of the state of reinforcement learning research and briefly present case studies, including some of the most impressive applications of reinforcement learning to date.
- Contents
- I. The Problem
- 1. Introduction
- 2. Evaluative Feedback
- 2.1 An
- 2.2 Action-Value Methods
- 2.3 Softmax Action Selection
- 2.4 Evaluation Versus Instruction
- 2.5 Incremental Implementation
- 2.6 Tracking a Nonstationary Problem
- 2.7 Optimistic Initial Values
- 2.8 Reinforcement Comparison
- 2.9 Pursuit Methods
- 2.10 Associative Search
- 2.11 Conclusions
- 2.12 Bibliographical and Historical Remarks
- 3. The Reinforcement Learning Problem
- 3.1 The Agent-Environment Interface
- 3.2 Goals and Rewards
- 3.3 Returns
- 3.4 Unified Notation for Episodic and Continuing Tasks
- 3.5 The Markov Property
- 3.6 Markov Decision Processes
- 3.7 Value Functions
- 3.8 Optimal Value Functions
- 3.9 Optimality and Approximation
- 3.10 Summary
- 3.11 Bibliographical and Historical Remarks
- II. Elementary Solution Methods
- 4. Dynamic Programming
- 5. Monte Carlo Methods
- 5.1 Monte Carlo Policy Evaluation
- 5.2 Monte Carlo Estimation of Action Values
- 5.3 Monte Carlo Control
- 5.4 On-Policy Monte Carlo Control
- 5.5 Evaluating One Policy While Following Another
- 5.6 Off-Policy Monte Carlo Control
- 5.7 Incremental Implementation
- 5.8 Summary
- 5.9 Bibliographical and Historical Remarks
- 6. Temporal-Difference Learning
- 6.1 TD Prediction
- 6.2 Advantages of TD Prediction Methods
- 6.3 Optimality of TD(0)
- 6.4 Sarsa: On-Policy TD Control
- 6.5 Q-Learning: Off-Policy TD Control
- 6.6 Actor-Critic Methods
- 6.7 R-Learning for Undiscounted Continuing Tasks
- 6.8 Games, Afterstates, and Other Special Cases
- 6.9 Summary
- 6.10 Bibliographical and Historical Remarks
- III. A Unified View
- 7. Eligibility Traces
- 7.1 Step TD Prediction
- 7.2 The Forward View of TD
- 7.3 The Backward View of TD
- 7.4 Equivalence of Forward and Backward Views
- 7.5 Sarsa
- 7.6 Q
- 7.7 Eligibility Traces for Actor-Critic Methods
- 7.8 Replacing Traces
- 7.9 Implementation Issues
- 7.10 Variable
- 7.11 Conclusions
- 7.12 Bibliographical and Historical Remarks
- 8. Generalization and Function Approximation
- 9. Planning and Learning
- 10. Dimensions of Reinforcement Learning
- 11. Case Studies
- 7. Eligibility Traces
- Bibliography