All Courses

Hands-on Practical: Simple Sequence Prediction

Now that we've covered how to define SimpleRNN layers using framework APIs and understand the input/output shape requirements, let's put it all together with a hands-on example. We'll tackle a straightforward sequence prediction task: predicting the next number in a simple arithmetic sequence. This exercise will solidify your understanding of building and training a basic RNN.

Our goal is to train an RNN to learn the pattern in sequences like [10, 20, 30], [25, 35, 45], etc., and predict the subsequent number. For instance, given the input sequence [10, 20], the model should learn to predict 30.

1. Generating the Data

First, we need some data. We'll create synthetic sequences where each number is 10 greater than the previous one. We'll generate pairs of (input_sequence, target_value). We'll use sequences of length 3 as input to predict the 4th element.

import numpy as np

def generate_sequences(n_sequences=1000, sequence_length=4):
    """Generates arithmetic sequences (step=10) and splits into X, y."""
    X, y = [], []
    for i in range(n_sequences):
        start_value = np.random.randint(0, 100)
        sequence = [start_value + j * 10 for j in range(sequence_length)]
        X.append(sequence[:-1]) # Input sequence (first 3 elements)
        y.append(sequence[-1])  # Target value (last element)
    return np.array(X), np.array(y)

# Generate 1000 samples, each sequence has 4 numbers total
# Input sequence length (time steps) will be 3
X_train_raw, y_train = generate_sequences(n_sequences=1000, sequence_length=4)

print("Sample Input Sequence (X):", X_train_raw[0])
print("Sample Target Value (y):", y_train[0])
print("Shape of X_train_raw:", X_train_raw.shape)
print("Shape of y_train:", y_train.shape)

# Expected Output:
# Sample Input Sequence (X): [start_val, start_val+10, start_val+20] (e.g., [42 52 62])
# Sample Target Value (y): start_val+30 (e.g., 72)
# Shape of X_train_raw: (1000, 3)
# Shape of y_train: (1000,)

2. Preparing Data for the RNN

As discussed earlier, standard RNN layers in frameworks like TensorFlow or PyTorch expect input data in a specific 3D format: (batch_size, time_steps, features).

batch_size: The number of sequences processed in one go during training. We'll let the framework handle this during training, so we focus on the shape of individual samples first.
time_steps: The length of the input sequence. In our case, this is 3 (e.g., [10, 20, 30]).
features: The number of features at each time step. Since each number in our sequence is a single value, the number of features is 1.

Our current X_train_raw has the shape (1000, 3). We need to reshape it to (1000, 3, 1).

# Reshape X to be [samples, time_steps, features]
n_samples = X_train_raw.shape[0]
n_time_steps = X_train_raw.shape[1]
n_features = 1 # Each time step has one feature (the number itself)

X_train = X_train_raw.reshape((n_samples, n_time_steps, n_features))

print("Reshaped X_train shape:", X_train.shape)

# Expected Output:
# Reshaped X_train shape: (1000, 3, 1)

We also often normalize the data for better training stability, although for this simple arithmetic task, it might converge even without it. Let's apply simple scaling by dividing by a constant (e.g., 100) for demonstration.

# Normalize the data (optional but good practice)
X_train = X_train / 100.0
y_train = y_train / 100.0

3. Building the Simple RNN Model

Now, let's construct the model using TensorFlow's Keras API. We'll use a Sequential model containing:

A SimpleRNN layer: This is the core recurrent layer. We need to specify the number of units (neurons) in the hidden state. Let's start with 5 units. We also need to provide the input_shape, which corresponds to (time_steps, features), excluding the batch size.
A Dense layer: This is a standard fully connected layer that will produce the final output prediction. Since we are predicting a single number, it will have 1 unit.

# Import TensorFlow
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense

# Define the model
model = Sequential([
    # SimpleRNN layer with 5 hidden units.
    # input_shape is (time_steps, features) -> (3, 1)
    SimpleRNN(5, input_shape=(n_time_steps, n_features)),

    # Output layer: Dense layer with 1 unit for predicting the single next value
    Dense(1)
])

# Display the model's architecture
model.summary()

The model.summary() output will show the layers, their output shapes, and the number of parameters. Notice how the SimpleRNN layer processes the sequence and outputs a shape like (None, 5), where None represents the batch size and 5 is the number of hidden units. This output (the final hidden state) is then fed into the Dense layer.

4. Compiling the Model

Before training, we need to compile the model. This involves specifying:

optimizer: The algorithm used to update the model weights (e.g., 'adam', 'rmsprop'). Adam is a common and effective choice.
loss: The function to measure the difference between the model's predictions and the actual target values. Since this is a regression task (predicting a continuous value), Mean Squared Error ('mse') is appropriate.

# Compile the model
model.compile(optimizer='adam', loss='mse')

5. Training the Model

We can now train the model using the fit method. We provide the training data (X_train, y_train), specify the number of epochs (passes through the entire dataset), and optionally set a batch_size. We'll also use a small portion of the data for validation during training to monitor performance on unseen examples.

# Train the model
# Use a portion of the data for validation (e.g., 20%)
history = model.fit(X_train, y_train, epochs=30, batch_size=32, validation_split=0.2, verbose=1)

# verbose=1 shows progress bar
# verbose=2 shows one line per epoch
# verbose=0 shows nothing

During training, you'll see the loss decreasing over epochs for both the training and validation sets. This indicates the model is learning the pattern.

We can visualize the training process by plotting the loss:

import matplotlib.pyplot as plt

# Plot training & validation loss values
plt.figure(figsize=(10, 6))
plt.plot(history.history['loss'], color='#1c7ed6', label='Train Loss')
plt.plot(history.history['val_loss'], color='#fd7e14', linestyle='--', label='Validation Loss')
plt.title('Model Loss During Training')
plt.ylabel('Loss (MSE)')
plt.xlabel('Epoch')
plt.legend(loc='upper right')
plt.grid(True, linestyle='--', alpha=0.6)
plt.show()

Training and validation loss decreasing over epochs, indicating successful learning.

6. Making Predictions

Now, let's test the trained model. We'll create a new input sequence, preprocess it exactly like the training data (reshape and normalize), and use model.predict to get the output. Remember to scale the prediction back to the original range.

# Example: Predict the next number after [50, 60, 70]
# Expected prediction: 80

input_sequence_raw = np.array([50, 60, 70])

# 1. Normalize
input_sequence_normalized = input_sequence_raw / 100.0

# 2. Reshape to (1, time_steps, features) -> (1, 3, 1)
input_sequence_reshaped = input_sequence_normalized.reshape((1, n_time_steps, n_features))

# 3. Predict
predicted_normalized = model.predict(input_sequence_reshaped)

# 4. Denormalize the prediction
predicted_value = predicted_normalized[0, 0] * 100.0

print(f"Input sequence: {input_sequence_raw}")
print(f"Predicted next value: {predicted_value:.2f}")

# Expected Output:
# Input sequence: [50 60 70]
# Predicted next value: ~80.00 (might be slightly off, e.g., 79.85)

The model should predict a value very close to 80, demonstrating that it has learned the simple arithmetic progression from the training data.

This hands-on example walked through the essential steps of using a SimpleRNN for a basic sequence task: generating data, preprocessing it into the correct shape, building the model architecture, training, and making predictions. While this task is simple, the workflow forms the foundation for tackling more complex sequence modeling problems with RNNs, LSTMs, and GRUs, which we will explore next. You'll encounter challenges like vanishing gradients when dealing with longer sequences, motivating the need for more advanced architectures like LSTMs and GRUs covered in subsequent chapters.

Was this section helpful?