While feedforward networks, including the CNNs discussed previously, process inputs independently, many real-world problems involve sequences where the order matters and context from previous items influences the current one. Think about understanding a sentence, predicting stock prices, or transcribing speech. Each word, price point, or sound segment depends on what came before it. Standard feedforward networks lack an inherent mechanism to "remember" past information within a sequence.
This is where Recurrent Neural Networks (RNNs) come in. They are specifically designed to handle sequential data by introducing the concept of recurrence.
The defining feature of an RNN is its internal loop. At each step in processing a sequence, the network considers not only the current input but also information it has retained from previous steps. This retained information is stored in what's called the hidden state.
Imagine reading a sentence. You don't process each word in isolation. Your understanding of the current word is heavily influenced by the words you've already read. The hidden state in an RNN acts like this running summary or context. It captures information about the preceding elements in the sequence.
An RNN processes a sequence one element (or "time step") at a time. For each time step t:
Significantly, the same set of weights (the rules for combining input and previous state, and for producing the output) are used at every time step. This weight sharing makes RNNs efficient and allows them to generalize patterns across different sequence lengths.
It's often helpful to visualize an RNN by "unrolling" it through time. Instead of drawing the loop, we can draw a chain representing the network's state at each time step.
An RNN "unrolled" in time. The same RNN Cell (representing the shared weights) processes input xt and the previous hidden state ht−1 to produce the new hidden state ht and an optional output yt. The hidden state is passed from one time step to the next.
Mathematically, the core computations inside a simple RNN cell at time step t are often represented as:
Calculate the new hidden state ht:
ht=tanh(Whhht−1+Wxhxt+bh)Calculate the output yt:
yt=Whyht+byHere:
The important part is the recurrent formula for ht, which depends on both the current input xt and the previous hidden state ht−1. This dependence is what gives RNNs their memory.
RNNs excel in tasks involving sequential patterns:
While powerful, simple RNNs like the one described above can struggle with learning long-range dependencies. Information from early time steps can get diluted or lost as it propagates through many steps, a problem often referred to as the vanishing gradient problem. Conversely, gradients can sometimes grow excessively large, known as the exploding gradient problem.
These challenges led to the development of more sophisticated recurrent architectures like Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU), which use gating mechanisms to better control the flow of information and memory. We will briefly touch upon these later in the chapter.
For now, understanding the fundamental concept of recurrence, the role of the hidden state, and the step-by-step processing is sufficient. In the following sections, we will see how to implement a basic RNN using PyTorch's nn.RNN
module.
© 2025 ApX Machine Learning