As we discussed in the chapter introduction, many real-world problems involve sequential data where the order of elements is fundamental to understanding the information. Think about understanding a sentence, predicting the next word someone will type, analyzing stock market trends over time, or interpreting sensor readings from a machine. Standard feedforward neural networks, like the Dense networks or even the CNNs we've seen, process inputs independently. They don't inherently maintain information about past inputs when processing the current one, which is a significant limitation for sequential tasks.
Recurrent Neural Networks (RNNs) are designed specifically to address this challenge. The core idea behind an RNN is the concept of recurrence: the network performs the same operation for every element of a sequence, but the output for each element depends not only on the current input but also on the results from previous elements. This is achieved by introducing a "loop" within the network architecture.
Imagine processing a sentence word by word. To understand the meaning of the word "it" in "The cat chased the mouse, and then it ran away," you need to remember whether "it" refers to the cat or the mouse. An RNN accomplishes this kind of context retention through an internal hidden state (often denoted as h).
At each step t in the sequence (e.g., processing the t-th word or the t-th time point), the RNN takes two inputs:
It then computes the new hidden state ht and, optionally, an output yt. The crucial part is that the computation of ht uses ht−1. This creates a dependency chain where information from earlier steps can propagate through the sequence via the hidden state. This allows the network to have a form of "memory," retaining context from past elements.
Conceptually, the update rule for the hidden state at time step t can be represented as:
ht=f(Whhht−1+Wxhxt+bh)And the output at time step t (if needed at each step) as:
yt=g(Whyht+by)Here:
tanh
or relu
for f, softmax
or sigmoid
for g depending on the task).The same set of weights (Wxh, Whh, Why) and biases (bh, by) are used across all time steps. This parameter sharing makes RNNs efficient and capable of generalizing patterns across different positions in the sequence.
To better visualize the flow of information, it's common to "unroll" the RNN loop over the sequence length. Imagine the sequence has T time steps. Unrolling means creating T copies of the network, one for each time step, and connecting the hidden state output of one step to the hidden state input of the next.
An RNN unrolled through time. Each
RNN Cell
block represents the same set of weights applied at different time steps. The hidden stateh
carries information from one step to the next. Inputx
and outputy
occur at each step.
This unrolled view makes it clearer how gradients flow during backpropagation (backpropagation through time, or BPTT) and why capturing long-range dependencies can sometimes be difficult, leading to issues like the vanishing gradient problem, which we'll discuss later.
RNNs are flexible in handling different sequence input/output relationships:
The core RNN concept provides the foundation for these different patterns. In Keras, layers like SimpleRNN
, LSTM
, and GRU
implement this recurrent behavior. We will explore how to use these layers, starting with the basic SimpleRNN
, in the next section.
© 2025 ApX Machine Learning