So far, we have concentrated on model-free algorithms that learn directly from interaction. This chapter shifts focus to model-based reinforcement learning. The central idea is for the agent to construct an internal model of how the environment behaves. This involves learning approximations of the state transition probabilities, often denoted as P(s′∣s,a), and the expected reward function, R(s,a,s′).
Once a model is learned, even an imperfect one, the agent can use it internally for planning or simulating experiences, potentially leading to more efficient use of real interactions. We will examine methods for learning these dynamics models and how to integrate them with planning. Key topics include the Dyna-Q architecture, using learned models for trajectory sampling, the fundamentals of Monte Carlo Tree Search (MCTS) and its integration, and connections to Model Predictive Control (MPC). We will also consider the practical issues associated with model accuracy and the computational cost of planning. By the end of this chapter, you'll be familiar with the rationale, techniques, and common approaches used in model-based RL.
© 2025 ApX Machine Learning