The structure of a basic Recurrent Neural Network (RNN) incorporates recurrence and hidden states. Unlike feedforward networks where information flows strictly in one direction, RNNs include a loop, allowing information from previous steps to persist and influence the current step. This internal memory makes them suitable for sequential data.
At the foundation of an RNN is the RNN cell. Think of this cell as the fundamental processing unit that gets reused for each element in the input sequence. At each time step , the cell takes two inputs:
Based on these inputs, the cell performs two main computations:
The core mechanism involves combining the current input and the previous hidden state using specific weight matrices and an activation function.
Let's break down the calculations happening within a simple RNN cell at a single time step :
Calculating the Hidden State (): The new hidden state is computed by applying a linear transformation to both the current input and the previous hidden state , adding a bias, and then passing the result through a non-linear activation function (commonly tanh or ReLU).
The formula is typically:
Here:
tanh is a common choice for the activation function, squashing the values to be between -1 and 1.Calculating the Output (): The output at the current time step is typically computed by applying another linear transformation to the newly calculated hidden state , adding a bias, and potentially passing it through another activation function depending on the specific task (e.g., a softmax function if the task is classification at each step).
The formula is often:
Here:
activation is an output activation function suitable for the problem (e.g., identity for regression, softmax for multi-class classification).To better visualize how an RNN processes a sequence, we often "unroll" the network through time. This means drawing the network as if it were a deep feedforward network, with one layer corresponding to each time step.
An RNN unrolled through time. Each
RNN Cellblock represents the processing at one time step, taking the input and the previous hidden state to produce the output and the next hidden state . Notice the hidden state is passed from one time step to the next (dashed lines). Critically, the same weight matrices (, , ) are used across all time steps.
The most important aspect illustrated by unrolling is parameter sharing. The weight matrices (, , ) and bias vectors (, ) are the same for every time step. This makes the model efficient, as it doesn't need a separate set of parameters for each input position. It learns a general transformation that can be applied repeatedly across the sequence.
The hidden state acts as the network's memory. It captures information from all the previous time steps () and combines it with the current input to influence the current output and the subsequent hidden state . This simple structure allows RNNs to model dependencies within sequences, forming the basis for processing sequential data. However, as we'll see next, this basic architecture faces certain difficulties when dealing with long sequences.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with