Building upon the concept of function approximation from the previous chapter, we now focus on combining Q-learning with deep neural networks. This combination, known as Deep Q-Networks (DQN), allows agents to learn effective policies in environments with high-dimensional state spaces, such as raw pixel inputs from games.
This chapter explains how DQNs function. We will first discuss the motivation for using deep neural networks to approximate the action-value function, Q(s,a;θ), where θ represents the network parameters. We'll then address the inherent instabilities that can occur when training these networks with reinforcement learning data, such as correlations between consecutive samples and the problem of constantly shifting target values during training. You will learn about two core techniques designed to mitigate these issues: experience replay, which stores and samples past transitions randomly, and the use of separate, periodically updated target networks to provide stable Q-value targets. By the end of this chapter, you will understand the structure of the standard DQN algorithm and the reasoning behind its key components.
7.1 Combining Q-Learning with Deep Learning
7.2 Challenges with Neural Networks in RL
7.3 Experience Replay Mechanism
7.4 Fixed Q-Targets (Target Networks)
7.5 The DQN Algorithm Structure
7.6 Architectural Considerations for DQNs
7.7 Hands-on Practical: Building a Basic DQN
© 2025 ApX Machine Learning