Now that we have a conceptual grasp of how Recurrent Neural Networks operate on sequential data using hidden states, let's put that into practice by building a simple RNN model using PyTorch's torch.nn
library. PyTorch provides a convenient module, nn.RNN
, that encapsulates the core RNN logic.
nn.RNN
ModuleThe primary building block for basic RNNs in PyTorch is the torch.nn.RNN
class. When you create an instance of this class, you are creating an RNN layer (or potentially multiple stacked layers) that can process sequences.
At its core, an nn.RNN
layer takes an input sequence and an optional initial hidden state. It then iterates through the timesteps of the input sequence, updating its hidden state at each step based on the current input and the previous hidden state. It produces an output sequence (the hidden state at each timestep) and the final hidden state after processing the entire sequence.
To initialize an nn.RNN
layer, you need to specify several important parameters:
input_size
: This defines the number of expected features in the input x at each timestep. For example, if you are processing word embeddings of dimension 300, input_size
would be 300.hidden_size
: This determines the number of features in the hidden state h. It also defines the dimension of the outputs at each timestep. The choice of hidden_size
is a hyperparameter that impacts the model's capacity.num_layers
: This allows you to stack multiple RNN layers on top of each other. The output sequence of the first layer becomes the input sequence for the second layer, and so on. Default is 1. Stacking layers can sometimes help the model learn more complex temporal patterns.nonlinearity
: The non-linearity to use. Can be either 'tanh'
(the default) or 'relu'
.batch_first
: A boolean parameter. If True
, then the input and output tensors are provided as (batch_size, seq_len, feature_dim)
. If False
(the default), they are (seq_len, batch_size, feature_dim)
. Setting this to True
is often more intuitive when working with data loaders that yield batches of sequences.dropout
: If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to dropout
. Default: 0.bidirectional
: If True
, becomes a bidirectional RNN. Default: False
. We will focus on unidirectional RNNs for now.Understanding the expected shapes of input and output tensors is essential for using nn.RNN
correctly. Let's assume batch_first=True
for clarity, as it's commonly used.
(batch_size, seq_len, input_size)
.
batch_size
: The number of sequences in the batch.seq_len
: The length of each sequence (number of timesteps).input_size
: The number of features at each timestep (matching the input_size
parameter of nn.RNN
).h_0
): (Optional) If you want to provide an initial hidden state, it should have the shape (num_layers, batch_size, hidden_size)
. If not provided, it defaults to a tensor of zeros.output
): This tensor contains the output features (hidden state) from the last RNN layer for each timestep. Its shape is (batch_size, seq_len, hidden_size)
.h_n
): This tensor contains the final hidden state for each RNN layer after processing the entire sequence. Its shape is (num_layers, batch_size, hidden_size)
. You might use this final hidden state as input to a subsequent layer (like a linear layer for classification).If batch_first=False
(the default), the batch_size
and seq_len
dimensions are swapped in the input and output sequence tensors. The hidden state tensors (h_0
, h_n
) always have batch_size
as the second dimension, regardless of batch_first
.
Let's create a basic RNN model using nn.Module
. This model will contain a single nn.RNN
layer followed by a nn.Linear
layer to map the final hidden state of the sequence to an output prediction. This pattern is common in sequence classification tasks.
import torch
import torch.nn as nn
class SimpleRNNModel(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim, num_rnn_layers=1):
"""
Initializes the SimpleRNNModel.
Args:
input_dim (int): Dimension of input features per timestep.
hidden_dim (int): Dimension of the RNN hidden state.
output_dim (int): Dimension of the final output.
num_rnn_layers (int): Number of stacked RNN layers. Default is 1.
"""
super().__init__() # Call the __init__ of the parent class (nn.Module)
self.hidden_dim = hidden_dim
self.num_rnn_layers = num_rnn_layers
# Define the RNN layer
# batch_first=True means input/output tensors shape: (batch, seq, feature)
self.rnn = nn.RNN(
input_size=input_dim,
hidden_size=hidden_dim,
num_layers=num_rnn_layers,
batch_first=True, # Make sure input shape is (batch, seq_len, input_size)
nonlinearity='tanh' # Default activation
)
# Define the output layer (fully connected)
# It takes the final hidden state of the RNN as input
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
"""
Defines the forward pass of the model.
Args:
x (torch.Tensor): Input tensor of shape (batch_size, seq_len, input_dim).
Returns:
torch.Tensor: Output tensor of shape (batch_size, output_dim).
"""
# Initialize hidden state with zeros
# Shape: (num_layers, batch_size, hidden_size)
batch_size = x.size(0)
h0 = torch.zeros(self.num_rnn_layers, batch_size, self.hidden_dim).to(x.device)
# Pass data through RNN layer
# rnn_out shape: (batch_size, seq_len, hidden_size)
# hn shape: (num_layers, batch_size, hidden_size)
rnn_out, hn = self.rnn(x, h0)
# We only need the hidden state from the last time step of the last layer
# hn contains the final hidden states for all layers.
# hn[-1] accesses the final hidden state of the last layer.
# Shape of hn[-1]: (batch_size, hidden_size)
last_layer_hidden_state = hn[-1]
# Pass the last hidden state through the fully connected layer
# out shape: (batch_size, output_dim)
out = self.fc(last_layer_hidden_state)
return out
# --- Example Usage ---
# Define model parameters
INPUT_DIM = 10 # Input feature dimension (e.g., embedding size)
HIDDEN_DIM = 20 # Hidden state dimension
OUTPUT_DIM = 5 # Output dimension (e.g., number of classes)
NUM_LAYERS = 1 # Number of RNN layers
# Create the model
model = SimpleRNNModel(INPUT_DIM, HIDDEN_DIM, OUTPUT_DIM, NUM_LAYERS)
print("Model Architecture:")
print(model)
# Create some dummy input data
BATCH_SIZE = 4
SEQ_LEN = 15
dummy_input = torch.randn(BATCH_SIZE, SEQ_LEN, INPUT_DIM) # Shape: (batch, seq, feature)
# Perform a forward pass
output = model(dummy_input)
print(f"\nInput shape: {dummy_input.shape}")
print(f"Output shape: {output.shape}")
# Verify output shape matches (BATCH_SIZE, OUTPUT_DIM)
assert output.shape == (BATCH_SIZE, OUTPUT_DIM)
__init__
): We define the nn.RNN
layer, specifying input_dim
, hidden_dim
, num_rnn_layers
, and importantly, batch_first=True
. We also define a standard nn.Linear
layer (self.fc
) that will take the final hidden state from the RNN as input and produce the model's final output.forward
):
batch_size
from the input tensor x
.h0
is created as a tensor of zeros with the required shape (num_layers, batch_size, hidden_dim)
. We ensure it's on the same device as the input x
using .to(x.device)
.x
and initial hidden state h0
are passed to the self.rnn
layer. It returns two tensors: rnn_out
(the hidden states for all timesteps from the last layer) and hn
(the final hidden state for all layers after the last timestep).hn
has the shape (num_layers, batch_size, hidden_size)
. We select the final hidden state from the last layer using hn[-1]
, which gives us a tensor of shape (batch_size, hidden_size)
.hn[-1]
is passed through the fully connected layer self.fc
to get the final output tensor of shape (batch_size, output_dim)
.(batch_size, seq_len, input_size)
format (because we set batch_first=True
). Running the model produces an output tensor, and we verify its shape is (batch_size, output_dim)
, suitable for tasks like multi-class classification where each sequence in the batch gets assigned one output vector.This example demonstrates the fundamental structure for using nn.RNN
within a custom PyTorch model. You can adapt this pattern for various sequence processing tasks. The next section will delve deeper into preparing sequential data in the correct format expected by these RNN layers.
© 2025 ApX Machine Learning