Okay, let's put the concepts of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) into practice. In this section, you'll implement basic versions of both architectures using PyTorch's nn.Module
and the specific layers introduced earlier in the chapter. This hands-on experience will solidify your understanding of how these models are constructed and how data flows through them.
We'll focus on defining the model structure and understanding the input/output dimensions, building directly on the nn.Module
concepts from Chapter 4 and the layer descriptions from earlier in this chapter. Remember, these are simplified examples; integrating them into a full training loop would involve adding data loading (Chapter 5), loss functions, optimizers, and the training logic (Chapter 6).
CNNs excel at processing grid-like data, such as images. Let's build a simple CNN that could be used for image classification. We'll define a network with convolutional layers, activation functions, pooling layers, and a final fully connected layer.
We create a class inheriting from nn.Module
. Inside __init__
, we define the layers we need: nn.Conv2d
for convolution, nn.ReLU
for activation, nn.MaxPool2d
for pooling, and nn.Linear
for the final classification layer. The forward
method defines how input data flows through these layers.
import torch
import torch.nn as nn
class SimpleCNN(nn.Module):
def __init__(self, num_classes=10):
super(SimpleCNN, self).__init__()
# Input shape: (Batch, 1, 28, 28) - assuming grayscale images like MNIST
self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3, stride=1, padding=1)
# Shape after conv1: (Batch, 16, 28, 28) -> (28 - 3 + 2*1)/1 + 1 = 28
self.relu1 = nn.ReLU()
self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
# Shape after pool1: (Batch, 16, 14, 14) -> 28 / 2 = 14
self.conv2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, stride=1, padding=1)
# Shape after conv2: (Batch, 32, 14, 14) -> (14 - 3 + 2*1)/1 + 1 = 14
self.relu2 = nn.ReLU()
self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
# Shape after pool2: (Batch, 32, 7, 7) -> 14 / 2 = 7
# Flatten the output for the linear layer
# Flattened size = 32 * 7 * 7 = 1568
self.fc = nn.Linear(32 * 7 * 7, num_classes)
def forward(self, x):
# Apply first convolutional block
out = self.conv1(x)
out = self.relu1(out)
out = self.pool1(out)
# Apply second convolutional block
out = self.conv2(out)
out = self.relu2(out)
out = self.pool2(out)
# Flatten the output from the convolutional layers
# -1 infers the batch size
out = out.view(out.size(0), -1)
# Apply the fully connected layer
out = self.fc(out)
return out
In this example:
nn.Conv2d(in_channels=1, out_channels=16, ...)
: Takes 1 input channel, applies 16 filters. kernel_size=3
, stride=1
, padding=1
are common choices that preserve the spatial dimensions after convolution.nn.MaxPool2d(kernel_size=2, stride=2)
: Reduces the height and width by half.out.view(out.size(0), -1)
: Flattens the tensor from shape (Batch, 32, 7, 7) to (Batch, 32 * 7 * 7) = (Batch, 1568) so it can be fed into the linear layer.nn.Linear(32 * 7 * 7, num_classes)
: The final layer maps the flattened features to the desired number of output classes.Let's create some dummy input data matching the expected shape (Batch Size, Channels, Height, Width) and pass it through our network to see the output shape.
# Instantiate the model
cnn_model = SimpleCNN(num_classes=10)
# Create a dummy input batch (e.g., 4 images, 1 channel, 28x28 pixels)
# Requires_grad=False as we are just doing a forward pass demonstration
dummy_input_cnn = torch.randn(4, 1, 28, 28, requires_grad=False)
# Perform a forward pass
output_cnn = cnn_model(dummy_input_cnn)
# Print input and output shapes
print(f"Input shape: {dummy_input_cnn.shape}")
print(f"Output shape: {output_cnn.shape}")
Running this should output:
Input shape: torch.Size([4, 1, 28, 28])
Output shape: torch.Size([4, 10])
This confirms our network takes a batch of 4 images and outputs predictions for 10 classes for each image. Notice how the forward
method dictates the data flow, and how we need to calculate the flattened size for the linear layer based on the output shape of the final pooling layer. You can revisit the section "Understanding Input/Output Shapes for CNN Layers" to practice calculating these dimensions manually.
RNNs are designed for sequential data. Let's build a simple RNN that could, for example, process sequences of characters or sensor readings.
We'll use the nn.RNN
layer. Remember that RNN layers expect input in the format (Sequence Length, Batch Size, Input Features).
import torch
import torch.nn as nn
class SimpleRNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size, num_layers=1):
super(SimpleRNN, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
# RNN layer
# batch_first=False by default, expects input: (Seq_len, Batch, Input_feature)
self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=False)
# Fully connected layer to map RNN output to final output size
self.fc = nn.Linear(hidden_size, output_size)
def forward(self, x, h0=None):
# x shape: (Seq_len, Batch, Input_feature)
# Initialize hidden state if not provided
# Shape: (Num_layers * Num_directions, Batch, Hidden_size)
if h0 is None:
h0 = torch.zeros(self.num_layers, x.size(1), self.hidden_size).to(x.device)
# Pass data through RNN layer
# out shape: (Seq_len, Batch, Hidden_size) -> contains output features for each time step
# hn shape: (Num_layers * Num_directions, Batch, Hidden_size) -> contains final hidden state
out, hn = self.rnn(x, h0)
# We can choose to use the output of the last time step
# out[-1] shape: (Batch, Hidden_size)
# Alternatively, process the entire sequence 'out' if needed
out_last_step = out[-1, :, :]
# Pass the output of the last time step through the linear layer
final_output = self.fc(out_last_step)
# final_output shape: (Batch, Output_size)
return final_output, hn # Return final output and last hidden state
In this example:
input_size
: The number of features at each step in the sequence.hidden_size
: The number of features in the hidden state.num_layers
: The number of stacked RNN layers.nn.RNN(...)
: The core RNN layer. batch_first=False
is the default, meaning the sequence length dimension comes first.forward
method takes the input sequence x
and an optional initial hidden state h0
. If h0
is not provided, it's initialized to zeros.nn.RNN
layer returns out
(outputs for every time step) and hn
(the final hidden state).out[-1, :, :]
) for sequence classification or prediction tasks, passing it through a final linear layer.Let's create a dummy sequence and pass it through our RNN.
# Define parameters
input_features = 10 # e.g., embedding dimension for characters/words
hidden_nodes = 20
output_classes = 5 # e.g., predict one of 5 categories based on the sequence
sequence_length = 15
batch_size = 4
# Instantiate the model
rnn_model = SimpleRNN(input_size=input_features, hidden_size=hidden_nodes, output_size=output_classes)
# Create a dummy input batch (Sequence Length, Batch Size, Input Features)
# Requires_grad=False for demonstration
dummy_input_rnn = torch.randn(sequence_length, batch_size, input_features, requires_grad=False)
# Perform a forward pass (without providing h0, it will be initialized)
output_rnn, final_hidden_state = rnn_model(dummy_input_rnn)
# Print input and output shapes
print(f"Input sequence shape: {dummy_input_rnn.shape}")
print(f"Output prediction shape: {output_rnn.shape}")
print(f"Final hidden state shape: {final_hidden_state.shape}")
Running this should produce output similar to:
Input sequence shape: torch.Size([15, 4, 10])
Output prediction shape: torch.Size([4, 5])
Final hidden state shape: torch.Size([1, 4, 20])
This shows the model processes a batch of 4 sequences, each 15 steps long with 10 features per step. It outputs a final prediction vector of size 5 for each sequence in the batch, along with the final hidden state. The hidden state shape reflects (Num Layers, Batch Size, Hidden Size).
Now that you've implemented basic versions of these architectures, try experimenting:
kernel_size
, stride
, or padding
in the nn.Conv2d
layers. Predict the output shape before running the code. How does padding='same'
(when stride=1) affect the output dimensions?nn.Linear
layer.out_channels
in the convolutional layers.num_layers
in the SimpleRNN
. Observe the shape of the initial hidden state h0
and the final hidden state hn
.hidden_size
.nn.RNN
with nn.LSTM
or nn.GRU
. Note that nn.LSTM
handles a tuple of hidden states (hidden state and cell state). You'll need to adjust the initialization and handling of hidden states accordingly. The input/output shapes largely follow the same pattern.forward
method to use the outputs from all time steps (out
) instead of just the last one, perhaps by applying the linear layer to every step or using an aggregation method like averaging.This practice provides a concrete foundation for building CNNs and RNNs. By understanding how to define these layers, connect them in a forward
method, and manage their input/output shapes, you are well-equipped to construct and adapt these powerful architectures for various deep learning tasks using PyTorch.
© 2025 ApX Machine Learning