All Courses

Building a Simple RNN in PyTorch

Now that we have a grasp of how Recurrent Neural Networks operate on sequential data using hidden states, let's put that into practice by building a simple RNN model using PyTorch's torch.nn library. PyTorch provides a convenient module, nn.RNN, that encapsulates the core RNN logic.

The `nn.RNN` Module

The primary building block for basic RNNs in PyTorch is the torch.nn.RNN class. When you create an instance of this class, you are creating an RNN layer (or potentially multiple stacked layers) that can process sequences.

At its core, an nn.RNN layer takes an input sequence and an optional initial hidden state. It then iterates through the timesteps of the input sequence, updating its hidden state at each step based on the current input and the previous hidden state. It produces an output sequence (the hidden state at each timestep) and the final hidden state after processing the entire sequence.

To initialize an nn.RNN layer, you need to specify several important parameters:

input_size: This defines the number of expected features in the input $x$ at each timestep. For example, if you are processing word embeddings of dimension 300, input_size would be 300.
hidden_size: This determines the number of features in the hidden state $h$ . It also defines the dimension of the outputs at each timestep. The choice of hidden_size is a hyperparameter that impacts the model's capacity.
num_layers: This allows you to stack multiple RNN layers on top of each other. The output sequence of the first layer becomes the input sequence for the second layer, and so on. Default is 1. Stacking layers can sometimes help the model learn more complex temporal patterns.
nonlinearity: The non-linearity to use. Can be either 'tanh' (the default) or 'relu'.
batch_first: A boolean parameter. If True, then the input and output tensors are provided as (batch_size, seq_len, feature_dim). If False (the default), they are (seq_len, batch_size, feature_dim). Setting this to True is often more intuitive when working with data loaders that yield batches of sequences.
dropout: If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to dropout. Default: 0.
bidirectional: If True, becomes a bidirectional RNN. Default: False. We will focus on unidirectional RNNs for now.

Input and Output Shapes

Understanding the expected shapes of input and output tensors is essential for using nn.RNN correctly. Let's assume batch_first=True for clarity, as it's commonly used.

Input: The input sequence should be a tensor of shape (batch_size, seq_len, input_size).
- batch_size: The number of sequences in the batch.
- seq_len: The length of each sequence (number of timesteps).
- input_size: The number of features at each timestep (matching the input_size parameter of nn.RNN).
Initial Hidden State (h_0): (Optional) If you want to provide an initial hidden state, it should have the shape (num_layers, batch_size, hidden_size). If not provided, it defaults to a tensor of zeros.
Output Sequence (output): This tensor contains the output features (hidden state) from the last RNN layer for each timestep. Its shape is (batch_size, seq_len, hidden_size).
Final Hidden State (h_n): This tensor contains the final hidden state for each RNN layer after processing the entire sequence. Its shape is (num_layers, batch_size, hidden_size). You might use this final hidden state as input to a subsequent layer (like a linear layer for classification).

If batch_first=False (the default), the batch_size and seq_len dimensions are swapped in the input and output sequence tensors. The hidden state tensors (h_0, h_n) always have batch_size as the second dimension, regardless of batch_first.

Implementing a Simple RNN Model

Let's create a basic RNN model using nn.Module. This model will contain a single nn.RNN layer followed by a nn.Linear layer to map the final hidden state of the sequence to an output prediction. This pattern is common in sequence classification tasks.

import torch
import torch.nn as nn

class SimpleRNNModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim, num_rnn_layers=1):
        """
        Initializes the SimpleRNNModel.

        Args:
            input_dim (int): Dimension of input features per timestep.
            hidden_dim (int): Dimension of the RNN hidden state.
            output_dim (int): Dimension of the final output.
            num_rnn_layers (int): Number of stacked RNN layers. Default is 1.
        """
        super().__init__() # Call the __init__ of the parent class (nn.Module)
        self.hidden_dim = hidden_dim
        self.num_rnn_layers = num_rnn_layers

        # Define the RNN layer
        # batch_first=True means input/output tensors shape: (batch, seq, feature)
        self.rnn = nn.RNN(
            input_size=input_dim,
            hidden_size=hidden_dim,
            num_layers=num_rnn_layers,
            batch_first=True, # Make sure input shape is (batch, seq_len, input_size)
            nonlinearity='tanh' # Default activation
        )

        # Define the output layer (fully connected)
        # It takes the final hidden state of the RNN as input
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        """
        Defines the forward pass of the model.

        Args:
            x (torch.Tensor): Input tensor of shape (batch_size, seq_len, input_dim).

        Returns:
            torch.Tensor: Output tensor of shape (batch_size, output_dim).
        """
        # Initialize hidden state with zeros
        # Shape: (num_layers, batch_size, hidden_size)
        batch_size = x.size(0)
        h0 = torch.zeros(self.num_rnn_layers, batch_size, self.hidden_dim).to(x.device)

        # Pass data through RNN layer
        # rnn_out shape: (batch_size, seq_len, hidden_size)
        # hn shape: (num_layers, batch_size, hidden_size)
        rnn_out, hn = self.rnn(x, h0)

        # We only need the hidden state from the last time step of the last layer
        # hn contains the final hidden states for all layers.
        # hn[-1] accesses the final hidden state of the last layer.
        # Shape of hn[-1]: (batch_size, hidden_size)
        last_layer_hidden_state = hn[-1]

        # Pass the last hidden state through the fully connected layer
        # out shape: (batch_size, output_dim)
        out = self.fc(last_layer_hidden_state)

        return out

# --- Example Usage ---

# Define model parameters
INPUT_DIM = 10   # Input feature dimension (e.g., embedding size)
HIDDEN_DIM = 20  # Hidden state dimension
OUTPUT_DIM = 5   # Output dimension (e.g., number of classes)
NUM_LAYERS = 1   # Number of RNN layers

# Create the model
model = SimpleRNNModel(INPUT_DIM, HIDDEN_DIM, OUTPUT_DIM, NUM_LAYERS)
print("Model Architecture:")
print(model)

# Create some dummy input data
BATCH_SIZE = 4
SEQ_LEN = 15
dummy_input = torch.randn(BATCH_SIZE, SEQ_LEN, INPUT_DIM) # Shape: (batch, seq, feature)

# Perform a forward pass
output = model(dummy_input)

print(f"\nInput shape: {dummy_input.shape}")
print(f"Output shape: {output.shape}")

# Verify output shape matches (BATCH_SIZE, OUTPUT_DIM)
assert output.shape == (BATCH_SIZE, OUTPUT_DIM)

Code Breakdown

Initialization (__init__): We define the nn.RNN layer, specifying input_dim, hidden_dim, num_rnn_layers, and importantly, batch_first=True. We also define a standard nn.Linear layer (self.fc) that will take the final hidden state from the RNN as input and produce the model's final output.
Forward Pass (forward):
- We first determine the batch_size from the input tensor x.
- An initial hidden state h0 is created as a tensor of zeros with the required shape (num_layers, batch_size, hidden_dim). We ensure it's on the same device as the input x using .to(x.device).
- The input x and initial hidden state h0 are passed to the self.rnn layer. It returns two tensors: rnn_out (the hidden states for all timesteps from the last layer) and hn (the final hidden state for all layers after the last timestep).
- Since our goal here is often sequence classification or summarization, we typically use the final hidden state. hn has the shape (num_layers, batch_size, hidden_size). We select the final hidden state from the last layer using hn[-1], which gives us a tensor of shape (batch_size, hidden_size).
- This final hidden state hn[-1] is passed through the fully connected layer self.fc to get the final output tensor of shape (batch_size, output_dim).
Example Usage: We instantiate the model and create random input data matching the expected (batch_size, seq_len, input_size) format (because we set batch_first=True). Running the model produces an output tensor, and we verify its shape is (batch_size, output_dim), suitable for tasks like multi-class classification where each sequence in the batch gets assigned one output vector.

This example demonstrates the fundamental structure for using nn.RNN within a custom PyTorch model. You can adapt this pattern for various sequence processing tasks. The next section will explore preparing sequential data in the correct format expected by these RNN layers.

Was this section helpful?

Building a Simple RNN in PyTorch

The nn.RNN Module

Input and Output Shapes

Implementing a Simple RNN Model

Code Breakdown

The `nn.RNN` Module