While Q-learning and SARSA provide a solid foundation for understanding reinforcement learning updates, their reliance on tabular representations faces significant hurdles when we move beyond simple problems. These "lookup table" approaches, where we store an explicit value for every single state or state-action pair, work well for environments with a manageable number of discrete states and actions. However, their practicality rapidly diminishes as the complexity of the environment grows.
The primary challenge is often referred to as the Curse of Dimensionality. Imagine trying to apply Q-learning to controlling a robot arm with multiple joints. Each joint's angle could be a state variable. Even if we discretize each angle into, say, 10 possible values, with 6 joints, we'd have 106 (one million) possible state configurations. Add velocity for each joint, and the number explodes further. For environments with continuous state variables (like sensor readings or object positions in physical space), the number of states is technically infinite. Discretizing continuous spaces coarsely loses information, while fine discretization leads back to an intractably large number of states.
The number of distinct states grows exponentially as more features are added, assuming each feature has 10 discrete values. This makes storing a table for all states infeasible quickly.
This exponential growth leads to several practical problems:
Memory Requirements: Storing a massive table requires significant memory. For problems like the game of Go, the number of possible board positions (>10170) vastly exceeds the capacity of any conceivable computer memory. We simply cannot create a Q-table large enough.
Computational Cost: Learning the value for every state requires visiting that state (or state-action pair) multiple times. In huge state spaces, an agent might explore for an extremely long time without ever encountering most states, let alone visiting them often enough to get accurate value estimates. Training becomes computationally prohibitive.
Lack of Generalization: Tabular methods treat each state independently. Learning about one state provides no information about other, potentially very similar, states. For instance, slightly changing the position of an object in a visual scene creates a new state in a tabular representation. The agent has no way to leverage its experience from the nearly identical previous state. It must learn the value for this "new" state from scratch. Intuitively, we want our agents to generalize learned knowledge across similar situations.
Similar issues arise with large or continuous action spaces. While not always hitting the same exponential barriers as state spaces, needing to compute and compare Q-values for a vast number of discrete actions, or finding the maximum over a continuous action range during the Q-learning update, can also become computationally demanding.
These limitations clearly indicate that tabular methods are insufficient for many real-world problems characterized by high-dimensional state spaces, continuous variables, or the need for generalization. To overcome these challenges, we need a way to approximate the value function (V(s) or Q(s,a)) instead of storing it explicitly for every entry. This is the core idea behind function approximation, which we will explore in the next chapter, starting with Deep Q-Networks (DQN).
© 2025 ApX Machine Learning