As we saw in previous chapters, algorithms like Q-learning and SARSA work by building a table. For Q-learning, this table stores a Q-value, Q(s,a), for every possible state s and action a. This approach is effective when the number of states and actions is reasonably small, like in simple Gridworld examples. We can directly store and update the value for each entry based on experience.
However, many real-world applications of Reinforcement Learning present a significant challenge: the sheer size of the state space. Consider these examples:
Let's visualize how quickly state spaces can grow. If each dimension of our state has D possible values, and we have k dimensions, the total number of states is Dk.
The number of states (Dk) grows exponentially with the number of state dimensions (k), quickly becoming computationally infeasible for tabular methods, even for moderate values per dimension (D). Note the logarithmic scale on the y-axis.
This explosive growth leads to several problems for tabular methods:
The way forward is to move from storing explicit values for every state to approximating the value function. We need a mechanism that can generalize from experienced states to unseen states. If we learn that certain states are good or bad, we want to infer that similar states are also likely good or bad.
This is where function approximation comes in. Instead of a table, we use a function (like a linear function, a neural network, or a decision tree) with a set of parameters w. This function takes a state s (or state-action pair (s,a)) as input and outputs an estimated value:
The number of parameters in w is typically vastly smaller than the number of states ∣S∣. For example, we might have billions of states but only need a few thousand or million parameters in our function approximator. By adjusting these parameters w based on experience, we aim to find an approximation that works well across the entire state space, including states the agent hasn't visited before.
The following sections will explore how we can represent states in a way suitable for these functions and how we can adapt our learning algorithms (like TD learning) to update the parameters w instead of table entries. We will start with simpler linear function approximators before touching upon more complex non-linear ones like neural networks.
© 2025 ApX Machine Learning