As we discussed in the previous chapter, standard feedforward neural networks have a fundamental limitation when dealing with sequential data: they lack memory. Each input is processed independently, without regard for the order or context provided by previous inputs. Think about reading this sentence. To understand the meaning of the word "sentence", your brain uses the context provided by the preceding words "reading this". Feedforward networks can't naturally do this. They are designed for fixed-size inputs where order doesn't inherently carry meaning in the same way.
This brings us to the central idea behind Recurrent Neural Networks (RNNs). Instead of processing the entire sequence at once, or processing each element in isolation, RNNs process sequences one element at a time, iteratively.
Imagine processing a sequence x=(x1,x2,...,xT). An RNN takes the first element x1, processes it, and produces an internal state (often called the hidden state), let's call it h1. When it's time to process the second element x2, the RNN doesn't just look at x2. It also considers the information it gleaned from the first step, summarized in the hidden state h1. It combines the new input x2 with the previous state h1 to compute the next hidden state h2.
This process repeats for every element in the sequence:
The key is the hidden state ht, which acts as the network's memory. It captures information about all the preceding elements (x1,...,xt−1) that the network deems relevant for processing the current element xt and future elements. This state is passed from one time step to the next, creating a loop or recurrence in the network's connections.
A conceptual view of an RNN processing one step. The input xt and the previous hidden state ht−1 are combined by the RNN operation to produce the current hidden state ht. This state ht is then carried forward to be used in the next time step (processing xt+1), and an optional output yt can be generated at this step.
This iterative processing with a persistent state is fundamentally different from feedforward networks. While a feedforward network applies the same transformation to every input independently, an RNN applies the same transformation rule (the same set of weights within the RNN cell) at each step, but the outcome (the hidden state ht and output yt) depends on both the current input xt and the history summarized in ht−1.
This ability to maintain context makes RNNs naturally suited for tasks where order matters and information needs to be aggregated over time, like understanding language, forecasting time series, or analyzing audio signals. The specific mathematical functions used inside the "RNN Operation" box in the diagram define how the previous state and current input are combined, and we will look into those details in the following sections.
© 2025 ApX Machine Learning