Approximating value functions is essential for large problems, where states are often represented using feature vectors x(s). One of the simplest and most fundamental approaches for this is to use linear functions.
Instead of a table storing a unique value for every state s, we approximate the state-value function vπ(s) using a linear combination of the state's features. We represent this approximation as v^(s,w), where w is a vector of weights or parameters that we need to learn.
Linear Approximation for State Values
The linear approximation for the state-value function is defined as the dot product between the weight vector w and the feature vector x(s) for that state:
v^(s,w)≐wTx(s)=i=1∑dwixi(s)
Here:
- v^(s,w) is our estimated value for state s.
- w=[w1,w2,...,wd]T is the column vector of weights. These are the parameters we adjust during learning to make our approximation v^ closer to the true value vπ.
- x(s)=[x1(s),x2(s),...,xd(s)]T is the column vector of features representing state s, as discussed in the previous section.
- d is the number of features. Critically, d is typically much, much smaller than the total number of states ∣S∣, which is why this approach is feasible for large state spaces. We are trading the ability to represent every state's value exactly for a more compact representation that generalizes across states.
The core idea is that states with similar feature vectors should have similar estimated values, determined by the learned weights w.
Linear Approximation for Action Values
Similarly, we can approximate the action-value function qπ(s,a) using a linear function. This requires features that depend on both the state s and the action a. We denote the feature vector for a state-action pair as x(s,a).
The linear approximation for the action-value function is:
q^(s,a,w)≐wTx(s,a)=i=1∑dwixi(s,a)
Here:
- q^(s,a,w) is the estimated value of taking action a in state s.
- w is again the weight vector we need to learn. Note that the same weights might be used to approximate values for all actions, or sometimes separate weights are used depending on the specific feature construction. For now, assume a single weight vector.
- x(s,a) is the feature vector representing the state-action pair. Constructing these features often involves combining state features with information about the action. For example, if actions are discrete, one could have separate features active only when a specific action is considered, or features could represent the interaction between state properties and action choices.
This formulation allows us to estimate Q-values, which is essential for control algorithms like Q-learning or SARSA when using function approximation.
Advantages of Linear Methods
Linear function approximation offers several benefits, making it a valuable tool and often a good starting point:
- Simplicity: The mathematical form is straightforward, making it relatively easy to understand, implement, and debug.
- Computational Efficiency: Calculating the approximate value involves a simple dot product. Learning algorithms based on gradient descent (which we'll cover next) are also computationally efficient for linear models.
- Good Convergence Properties: When combined with appropriate learning algorithms (like gradient-based methods), linear function approximation often comes with better-understood convergence behavior compared to more complex non-linear approximators. While convergence to the globally optimal weights isn't always guaranteed in the RL context (due to bootstrapping, for instance), linear methods are generally stable.
- Interpretability: Sometimes, the learned weights wi can offer insights into the importance of different features xi(s) in determining the value.
A simple linear function approximates the relationship between a single state feature and its estimated value. The dashed line represents v^(s,w), attempting to fit the sampled state values (blue dots) obtained from experience.
The Role of Features
It's important to remember that the effectiveness of linear function approximation heavily relies on the chosen features x(s) or x(s,a). If the features capture the aspects of the state (or state-action pair) that truly influence the value, then a linear combination might be a good approximation. However, if the features are poorly chosen, or if the underlying true value function has a complex, non-linear structure, then a linear model will struggle to provide accurate estimates, regardless of how well we learn the weights w. Techniques like tile coding, polynomial bases, or Fourier bases are common ways to construct features that can help linear methods capture some non-linearity.
Limitations
The primary limitation of linear methods is their restricted representational capacity. They can only represent linear relationships between the features and the value. If the true value function is highly non-linear, even with the best possible weights, the linear approximation v^(s,w) or q^(s,a,w) might be significantly different from the true values vπ(s) or qπ(s,a). This limitation motivates the use of non-linear function approximators, such as neural networks, which we will introduce later in this chapter.
Furthermore, designing good features often requires significant domain knowledge and engineering effort.
Linear methods provide a foundational understanding of how function approximation works in RL. The next step is to learn how to actually find the weight vector w that makes our approximation as accurate as possible, which typically involves using gradient-based optimization methods.