Building upon the idea of recurrence and hidden states, let's examine the structure of a basic Recurrent Neural Network (RNN). Unlike feedforward networks where information flows strictly in one direction, RNNs incorporate a loop, allowing information from previous steps to persist and influence the current step. This internal memory is what makes them suitable for sequential data.
At the heart of an RNN is the RNN cell. Think of this cell as the fundamental processing unit that gets reused for each element in the input sequence. At each time step t, the cell takes two inputs:
Based on these inputs, the cell performs two main computations:
The core mechanism involves combining the current input xt and the previous hidden state ht−1 using specific weight matrices and an activation function.
Let's break down the calculations happening within a simple RNN cell at a single time step t:
Calculating the Hidden State (ht): The new hidden state ht is computed by applying a linear transformation to both the current input xt and the previous hidden state ht−1, adding a bias, and then passing the result through a non-linear activation function (commonly tanh
or ReLU
).
The formula is typically:
ht=tanh(Wxhxt+Whhht−1+bh)Here:
tanh
is a common choice for the activation function, squashing the values to be between -1 and 1.Calculating the Output (yt): The output yt at the current time step is typically computed by applying another linear transformation to the newly calculated hidden state ht, adding a bias, and potentially passing it through another activation function depending on the specific task (e.g., a softmax
function if the task is classification at each step).
The formula is often:
yt=activation(Whyht+by)Here:
activation
is an output activation function suitable for the problem (e.g., identity for regression, softmax for multi-class classification).To better visualize how an RNN processes a sequence, we often "unroll" the network through time. This means drawing the network as if it were a deep feedforward network, with one layer corresponding to each time step.
An RNN unrolled through time. Each
RNN Cell
block represents the processing at one time step, taking the input xt and the previous hidden state ht−1 to produce the output yt and the next hidden state ht. Notice the hidden state h is passed from one time step to the next (dashed lines). Critically, the same weight matrices (Wxh, Whh, Why) are used across all time steps.
The most important aspect illustrated by unrolling is parameter sharing. The weight matrices (Wxh, Whh, Why) and bias vectors (bh, by) are the same for every time step. This makes the model efficient, as it doesn't need a separate set of parameters for each input position. It learns a general transformation that can be applied repeatedly across the sequence.
The hidden state ht acts as the network's memory. It captures information from all the previous time steps (x0,x1,...,xt−1) and combines it with the current input xt to influence the current output yt and the subsequent hidden state ht+1. This simple structure allows RNNs to model dependencies within sequences, forming the basis for processing sequential data. However, as we'll see next, this basic architecture faces certain difficulties when dealing with long sequences.
© 2025 ApX Machine Learning