In Chapter 1, we reviewed how function approximation allows reinforcement learning agents to handle large or continuous state and action spaces where tabular methods are infeasible. Linear function approximation, where the value function V(s) or Q-function Q(s,a) is represented as a linear combination of features, was a significant step:
Q(s,a;θ)≈i=1∑dθiϕi(s,a)=θTϕ(s,a)Here, ϕ(s,a) is a feature vector derived from the state s and possibly the action a, and θ is the vector of weights we aim to learn. This approach works reasonably well when we can define a set of features ϕ that capture the essential information for predicting values, and when the true value function is indeed close to linear in those features. Examples include using tile coding or radial basis functions in moderately sized state spaces.
However, the effectiveness of linear function approximation hinges critically on the quality of the hand-engineered features ϕ(s,a). This presents a major bottleneck when tackling complex problems with high-dimensional, raw sensory inputs. Consider the challenge of learning to play Atari games directly from screen pixels, a task popularized by DeepMind. The state s is an image (or a short sequence of images), consisting of thousands or millions of pixel values.
How would you manually design a feature vector ϕ(s) from raw pixels that effectively captures the game situation for predicting Q-values? You might try:
While possible for simpler games, this process quickly becomes incredibly complex and brittle:
Essentially, relying on manual feature engineering shifts the burden of representation learning from the algorithm to the human designer. For many problems of interest, especially those involving perception (vision, audio), this is impractical.
Beyond the difficulty of feature design, linear models themselves have fundamental limitations. They assume that the target function (the Q-value) is a linear combination of the provided features. This assumption often doesn't hold in reality.
Contrasting linear approximation (requiring manual feature design) with deep learning (performing automatic feature learning) for complex state spaces.
These limitations motivate the need for more powerful function approximators capable of both learning relevant features automatically from raw, high-dimensional input and capturing complex, non-linear relationships. Deep neural networks, particularly convolutional neural networks (CNNs) for image data and recurrent neural networks (RNNs) for sequential data, have demonstrated remarkable success in representation learning across various domains.
By using a deep neural network Q(s,a;θ) with weights θ, we replace the manual feature engineering step ϕ(s,a) with a learned transformation. The network itself learns to extract salient features from the input state s and combines them non-linearly to produce Q-value estimates. This ability to learn representations directly from experience is the primary reason for adopting deep learning in reinforcement learning, leading to the development of Deep Q-Networks (DQN) and subsequent advanced algorithms. We will now explore how DQN leverages deep learning to overcome the limitations of linear methods.
© 2025 ApX Machine Learning