Sequence prediction is a fundamental task in sequence modeling where the goal is to forecast future elements of a sequence based on its past elements. Having learned the mechanics of RNNs, LSTMs, and GRUs, we can now explore how to configure these networks specifically for prediction tasks, most notably time series forecasting, but also applicable in areas like predicting the next word in a sentence or the next note in a melody.
The core idea is straightforward: we feed a segment of the sequence history into the recurrent network, and the network learns to output one or more future values. The network's internal state, maintained across time steps, allows it to capture temporal dependencies within the input sequence, which are essential for making accurate predictions.
Different prediction problems require different input-output mappings. Let's look at the common patterns:
Many-to-One Prediction: This is perhaps the most common setup for basic forecasting. The network receives a sequence of input values (the history) and predicts a single value for the next time step.
return_sequences
parameter in the recurrent layer is usually set to False
(or the framework's equivalent) because we only need the last output.Many-to-Many Prediction (Direct Multi-step): In this approach, the network takes a sequence of inputs and directly predicts a sequence of future outputs.
Many-to-Many Prediction (Iterative Multi-step): Another way to predict multiple steps ahead is iteratively. A Many-to-One model is trained to predict just the next time step. To predict further ahead, the model's prediction for step t+1 is appended to the input sequence, which is then used to predict step t+2, and so on.
Many-to-Many Prediction (Aligned Output - Less common for forecasting): Here, the network produces an output for each input time step. This is more typical for tasks like sequence labeling (e.g., part-of-speech tagging) than standard forecasting, but it's a possible structure. For prediction, it might mean predicting a processed version of the input at each step.
return_sequences=True
). These outputs are then typically processed by a TimeDistributed
wrapper around a Dense layer (or equivalent mechanism) to produce an output for each time step.Let's focus on the common Many-to-One setup for predicting the next value in a time series.
Input Shaping: As discussed in Chapter 8, time series data is often prepared using a sliding window approach. If we want to predict the value at time t using the previous k values, our input sequence is [xt−k,xt−k+1,...,xt−1] and the target output is xt. Each input sample fed to the RNN will have the shape (k, num_features)
, where num_features
is the number of data streams measured at each time step (e.g., 1 for just temperature, or multiple if including pressure, humidity, etc.). Batched input would have the shape (batch_size, k, num_features)
.
Model Structure:
(k, num_features)
).return_sequences=False
. The number of units in these layers is a hyperparameter to tune.A typical Many-to-One architecture for sequence prediction. The recurrent layer processes the input sequence, and its final hidden state is passed to a Dense layer to produce the single predicted value.
These fundamental approaches form the basis for applying RNNs to a wide variety of sequence prediction problems. The next sections will explore how similar architectures can be adapted for classification and generation tasks.
© 2025 ApX Machine Learning