As we saw in the previous chapter, traditional methods like Q-learning and SARSA rely on tables to store value estimates (Q-values) for every state or state-action pair. This approach works well when the number of states and actions is manageably small, such as in simple grid worlds or tic-tac-toe.
However, many interesting problems have enormous or even infinite state spaces. Consider these scenarios:
In these situations, the tabular approach breaks down completely due to the curse of dimensionality. We face two major challenges:
To handle these complex, large-scale problems, we need a more scalable approach. Instead of storing explicit values for every single state-action pair, we can approximate the value function using a parameterized function. This technique is called function approximation.
The core idea is to use a function, let's call it Q^(s,a;θ), that takes the state s (and possibly the action a) as input and outputs an estimate of the true Q-value Q(s,a). The function has a set of parameters, denoted by θ, which we will learn and adjust based on the agent's experience.
Comparing tabular lookup with function approximation for obtaining Q-values. Function approximation uses a parameterized function to estimate values, enabling scalability and generalization.
Function approximation offers significant advantages:
Various types of functions can serve as approximators, including linear functions, decision trees, and tile coding. However, for complex problems involving high-dimensional inputs like images or intricate state representations, deep neural networks have proven exceptionally effective. Their ability to learn hierarchical features and model complex, non-linear relationships makes them powerful Q-function approximators.
In this chapter, we will focus specifically on using deep neural networks to approximate the action-value function Q(s,a). This combination forms the basis of Deep Q-Networks (DQN), marking a significant step forward from tabular methods and enabling reinforcement learning to tackle much more complex tasks. We will now look at how to construct and train such a network.
© 2025 ApX Machine Learning