Now that we understand how to define individual SimpleRNN
layers using framework APIs and how to prepare input data with the correct shape (typically batch_size
, time_steps
, features
), let's combine these components to construct a complete, basic RNN model.
Most sequence models follow a common pattern: processing the input sequence through one or more recurrent layers and then mapping the final recurrent state (or sequence of states) to the desired output using standard feedforward layers.
A typical basic RNN model for sequence tasks might involve these layers:
num_classes
units and softmax activation for multi-class classification.Let's see how this looks in popular frameworks.
Using TensorFlow (Keras API)
Keras makes stacking layers straightforward using the Sequential
model API.
import tensorflow as tf
from tensorflow import keras
from keras import layers
# --- Assuming these variables are defined ---
vocab_size = 10000 # Example vocabulary size for text
embedding_dim = 16 # Dimension for embedding vectors
rnn_units = 32 # Number of units in the SimpleRNN layer
num_classes = 2 # Example: for binary classification
max_sequence_length = 100 # Example maximum sequence length
# -----------------------------------------
# Example with Embedding Layer
model_text = keras.Sequential(name="SimpleRNN_Text_Classifier")
model_text.add(layers.Input(shape=(max_sequence_length,), dtype='int32', name="Input_Sequence")) # Define input shape explicitly
model_text.add(layers.Embedding(input_dim=vocab_size, output_dim=embedding_dim, name="Token_Embedding"))
model_text.add(layers.SimpleRNN(rnn_units, name="Simple_RNN")) # By default, only returns the last hidden state
model_text.add(layers.Dense(num_classes, activation='softmax', name="Output_Classifier")) # Output layer for classification
# Example without Embedding (e.g., for numerical time series)
# Input shape: (time_steps, features_per_step)
input_features = 5 # Example number of features per time step
model_numeric = keras.Sequential(name="SimpleRNN_Numeric_Regressor")
model_numeric.add(layers.Input(shape=(max_sequence_length, input_features), name="Input_TimeSeries")) # Define input shape explicitly
model_numeric.add(layers.SimpleRNN(rnn_units, name="Simple_RNN"))
model_numeric.add(layers.Dense(1, name="Output_Regressor")) # Output layer for regression
# You can print the model summary to check the architecture
print("Text Model Summary:")
model_text.summary()
print("\nNumeric Model Summary:")
model_numeric.summary()
Using PyTorch
In PyTorch, you typically define a custom class inheriting from torch.nn.Module
.
import torch
import torch.nn as nn
# --- Assuming these variables are defined ---
vocab_size = 10000
embedding_dim = 16
rnn_units = 32
num_classes = 2
input_features = 5 # For numeric example
# -----------------------------------------
# Example with Embedding Layer
class SimpleRNNClassifier(nn.Module):
def __init__(self, vocab_size, embedding_dim, rnn_units, num_classes):
super().__init__()
self.embedding = nn.Embedding(vocab_size, embedding_dim)
# batch_first=True makes input shape (batch, seq_len, features)
self.rnn = nn.RNN(embedding_dim, rnn_units, batch_first=True)
self.fc = nn.Linear(rnn_units, num_classes)
def forward(self, x):
# x shape: (batch_size, seq_len)
embedded = self.embedding(x)
# embedded shape: (batch_size, seq_len, embedding_dim)
# output contains hidden states for each time step
# hidden contains the final hidden state
output, hidden = self.rnn(embedded)
# We usually use the final hidden state for classification
# hidden shape: (num_layers * num_directions, batch_size, rnn_units)
# Squeeze the first dimension if num_layers and num_directions are 1
final_hidden = hidden.squeeze(0)
# final_hidden shape: (batch_size, rnn_units)
out = self.fc(final_hidden)
# out shape: (batch_size, num_classes)
# Apply softmax typically happens in the loss function (CrossEntropyLoss)
# If needed explicitly: return torch.softmax(out, dim=1)
return out
# Example without Embedding (e.g., for numerical time series)
class SimpleRNNRegressor(nn.Module):
def __init__(self, input_features, rnn_units):
super().__init__()
self.rnn = nn.RNN(input_features, rnn_units, batch_first=True)
self.fc = nn.Linear(rnn_units, 1) # Output for regression
def forward(self, x):
# x shape: (batch_size, seq_len, input_features)
output, hidden = self.rnn(x)
final_hidden = hidden.squeeze(0)
out = self.fc(final_hidden)
# out shape: (batch_size, 1)
return out
# Instantiate the models
model_text_pt = SimpleRNNClassifier(vocab_size, embedding_dim, rnn_units, num_classes)
model_numeric_pt = SimpleRNNRegressor(input_features, rnn_units)
# Print the model structure
print("PyTorch Text Model Structure:")
print(model_text_pt)
print("\nPyTorch Numeric Model Structure:")
print(model_numeric_pt)
We can visualize this structure as a flow of data through layers. For a text classification task, it might look like this:
A typical data flow for a simple RNN model used for sequence classification, starting with input sequences and ending with predictions.
In both TensorFlow/Keras and PyTorch, creating the model involves defining the sequence of layers and ensuring the output shape of one layer matches the expected input shape of the next. Frameworks handle the weight initialization and provide mechanisms to access layer parameters. Using the summary()
method (Keras) or printing the model object (PyTorch) is helpful to verify the layer connections, output shapes, and the total number of trainable parameters before proceeding.
Once the model structure is defined and instantiated, the next step is to configure the training process. This involves choosing a loss function appropriate for the task (e.g., CategoricalCrossentropy
or SparseCategoricalCrossentropy
for classification, MeanSquaredError
for regression), selecting an optimizer (like Adam or RMSprop), and potentially specifying metrics to monitor during training. We will detail the training loop itself in the following section.
© 2025 ApX Machine Learning