As introduced earlier, Recurrent Neural Networks process sequential information step-by-step, maintaining an internal memory. Let's look closely at the basic building block that makes this possible: the simple RNN cell.
Think of the RNN cell as the computational engine that operates at each time step within a sequence. For every element xt in the input sequence (at time step t), the cell takes two inputs:
Using these inputs, the cell performs a calculation to produce two outputs:
The core computation within a simple RNN cell involves combining the current input and the previous hidden state using learned weights and an activation function. The chapter introduction showed the basic equations:
ht=f(Whhht−1+Wxhxt+bh) yt=g(Whyht+by)
Let's break this down:
tanh
), which squashes the values into the range [-1, 1]. This helps regulate the information flow and mitigate some gradient issues (though not completely, as we'll see later).softmax
for classification, linear for regression).A significant aspect of this architecture is weight sharing. The same weight matrices (Wxh, Whh, Why) and biases (bh, by) are used at every single time step. This means the network learns a single set of parameters that define how to process an input element and update its memory, regardless of where that element appears in the sequence. This makes RNNs parameter-efficient and capable of generalizing across different sequence lengths.
We can visualize a single RNN cell and its connections:
A single RNN cell processes input xt and previous state ht−1 to compute the new state ht and output yt. Weight matrices (Wxh, Whh, Why) are applied during these transformations.
The real power comes from chaining these cells together, creating the recurrent loop. The hidden state ht computed at time step t becomes the input ht−1 for the cell at time step t+1. This "unrolling" in time allows information to flow through the sequence:
An RNN unrolled for three time steps. The hidden state (ht) computed by the cell at each step is passed as input to the cell at the next step (t+1). The same cell parameters (weights W and biases b) are used at each step.
This unrolled view is conceptually useful, especially when thinking about how gradients flow backward during training (Backpropagation Through Time), which we'll cover next. However, remember that in practice, it's the same cell structure (with the same weights) being applied repeatedly, not distinct copies for each time step. This architecture allows RNNs to process sequences of arbitrary length using a fixed number of parameters.
© 2025 ApX Machine Learning