Now that we understand the necessity of approximating value functions for large problems and how states can be represented using feature vectors x(s), let's explore one of the simplest and most fundamental approaches: using linear functions.
Instead of a table storing a unique value for every state s, we approximate the state-value function vπ(s) using a linear combination of the state's features. We represent this approximation as v^(s,w), where w is a vector of weights or parameters that we need to learn.
The linear approximation for the state-value function is defined as the dot product between the weight vector w and the feature vector x(s) for that state:
v^(s,w)≐wTx(s)=i=1∑dwixi(s)Here:
The core idea is that states with similar feature vectors should have similar estimated values, determined by the learned weights w.
Similarly, we can approximate the action-value function qπ(s,a) using a linear function. This requires features that depend on both the state s and the action a. We denote the feature vector for a state-action pair as x(s,a).
The linear approximation for the action-value function is:
q^(s,a,w)≐wTx(s,a)=i=1∑dwixi(s,a)Here:
This formulation allows us to estimate Q-values, which is essential for control algorithms like Q-learning or SARSA when using function approximation.
Linear function approximation offers several benefits, making it a valuable tool and often a good starting point:
A simple linear function approximates the relationship between a single state feature and its estimated value. The dashed line represents v^(s,w), attempting to fit the sampled state values (blue dots) obtained from experience.
It's important to remember that the effectiveness of linear function approximation heavily relies on the chosen features x(s) or x(s,a). If the features capture the aspects of the state (or state-action pair) that truly influence the value, then a linear combination might be a good approximation. However, if the features are poorly chosen, or if the underlying true value function has a complex, non-linear structure, then a linear model will struggle to provide accurate estimates, regardless of how well we learn the weights w. Techniques like tile coding, polynomial bases, or Fourier bases are common ways to construct features that can help linear methods capture some non-linearity.
The primary limitation of linear methods is their restricted representational capacity. They can only represent linear relationships between the features and the value. If the true value function is highly non-linear, even with the best possible weights, the linear approximation v^(s,w) or q^(s,a,w) might be significantly different from the true values vπ(s) or qπ(s,a). This limitation motivates the use of non-linear function approximators, such as neural networks, which we will introduce later in this chapter.
Furthermore, designing good features often requires significant domain knowledge and engineering effort.
Linear methods provide a foundational understanding of how function approximation works in RL. The next step is to learn how to actually find the weight vector w that makes our approximation as accurate as possible, which typically involves using gradient-based optimization methods.
© 2025 ApX Machine Learning