As we saw in Chapter 1, the tabular approach to storing Q-values breaks down when dealing with environments that have a very large number of states, like those represented by images or complex feature vectors. Imagine trying to create a table entry for every possible configuration of pixels on a screen. It's simply not feasible due to memory limitations and the sheer impossibility of visiting every state to learn its value. Furthermore, tabular methods cannot generalize. If the agent encounters a state it hasn't seen before, even one very similar to a known state, it has no basis for estimating its value.
To handle these challenges, we introduce function approximation. Instead of storing an exact value for each Q(s,a) pair in a table, we use a function with learnable parameters to estimate these values. Our goal is to find a function Q(s,a;θ) parameterized by a vector θ that approximates the true action-value function Q(s,a).
Neural networks are particularly well suited for this task. They are powerful function approximators capable of learning complex, non-linear relationships between inputs and outputs. Crucially, they excel at handling high-dimensional inputs, such as raw pixel data from game screens or sensor readings from robots, and can learn meaningful features from this data automatically. By using a neural network, we aim to learn a parameter vector θ (representing the network's weights and biases) such that:
Q(s,a)≈Q(s,a;θ)How do we structure a neural network to represent Q(s,a;θ)? A common and effective approach, especially for environments with discrete action spaces (like moving left, right, up, or down), is to design a network that takes the state s as input and outputs a vector of Q-values, one for each possible action a in that state.
Diagram illustrating how a neural network takes a state s as input and outputs estimated Q-values for each possible action ai in that state. The network's parameters are denoted by θ.
This architecture is efficient because it allows us to compute the Q-values for all actions in a given state with a single forward pass through the network. This is useful for action selection, where we typically need to find the action with the highest Q-value (i.e., argmaxaQ(s,a;θ)).
The most significant advantage of using a neural network as a function approximator is generalization. Because the network learns underlying patterns in the state space, it can produce reasonable Q-value estimates even for states it hasn't encountered during training, provided they are similar to states it has seen. If two states s1 and s2 are represented by similar input vectors, the network will likely produce similar Q-value outputs for them. This allows the agent to leverage past experience much more effectively than tabular methods, leading to faster learning in large state spaces.
For example, in an Atari game, the network might learn that certain visual patterns (like an approaching enemy) are associated with negative outcomes regardless of their exact pixel location on the screen. It learns a compressed, meaningful representation of the state that captures the important information for decision-making.
By replacing the Q-table with a neural network, we lay the foundation for Deep Q-Networks (DQN). The subsequent sections will detail how we train this network's parameters θ using techniques adapted from Q-learning, incorporating mechanisms like Experience Replay and Target Networks to ensure stable and effective learning.
© 2025 ApX Machine Learning