So far, we have concentrated on model-free algorithms that learn directly from interaction. This chapter shifts focus to model-based reinforcement learning. The central idea is for the agent to construct an internal model of how the environment behaves. This involves learning approximations of the state transition probabilities, often denoted as P(s′∣s,a), and the expected reward function, R(s,a,s′).
Once a model is learned, even an imperfect one, the agent can use it internally for planning or simulating experiences, potentially leading to more efficient use of real interactions. We will examine methods for learning these dynamics models and how to integrate them with planning. Key topics include the Dyna-Q architecture, using learned models for trajectory sampling, the fundamentals of Monte Carlo Tree Search (MCTS) and its integration, and connections to Model Predictive Control (MPC). We will also consider the practical issues associated with model accuracy and the computational cost of planning. By the end of this chapter, you'll be familiar with the rationale, techniques, and common approaches used in model-based RL.
5.1 Rationale for Model-Based RL
5.2 Taxonomy of Model-Based Methods
5.3 Learning Environment Dynamics Models
5.4 Dyna Architectures: Integrating Learning and Planning
5.5 Planning with Learned Models: Trajectory Sampling
5.6 Monte Carlo Tree Search (MCTS) Fundamentals
5.7 Integrating MCTS with Learned Models
5.8 Model Predictive Control (MPC) Connections
5.9 Challenges: Model Accuracy and Computational Cost
5.10 Simple Model-Based Agent Hands-on Practical
© 2025 ApX Machine Learning