Now that you understand how to define the overall structure of a model using torch.nn.Module
, let's look at some of the fundamental building blocks provided by PyTorch: layers. The torch.nn
package offers a wide variety of pre-built layers that perform common operations found in neural networks. These layers encapsulate both the learnable parameters (weights and biases) and the operations themselves. Here, we'll introduce three essential types: Linear, Convolutional, and Recurrent layers.
nn.Linear
)The most basic type of neural network layer is the Linear layer, also known as a fully connected or dense layer. It applies a linear transformation to the incoming data. If the input tensor has a shape of (∗,Hin), where ∗ represents any number of leading dimensions (like the batch size) and Hin is the number of input features, the nn.Linear
layer transforms it into an output tensor of shape (∗,Hout), where Hout is the number of output features specified for the layer.
Mathematically, this operation is described as y=xWT+b, where:
You create a linear layer by specifying the number of input features and output features.
import torch
import torch.nn as nn
# Example: Expecting input features of size 20, producing output features of size 30
linear_layer = nn.Linear(in_features=20, out_features=30)
# Create a sample input tensor (batch size 64, 20 features)
input_tensor = torch.randn(64, 20)
# Pass the input through the layer
output_tensor = linear_layer(input_tensor)
print(f"Input shape: {input_tensor.shape}")
print(f"Output shape: {output_tensor.shape}")
# Expected output:
# Input shape: torch.Size([64, 20])
# Output shape: torch.Size([64, 30])
# Inspect the layer's parameters (automatically created)
print(f"\nWeight shape: {linear_layer.weight.shape}")
print(f"Bias shape: {linear_layer.bias.shape}")
# Expected output:
# Weight shape: torch.Size([30, 20])
# Bias shape: torch.Size([30])
Linear layers are fundamental components in many architectures, including simple Multi-Layer Perceptrons (MLPs) and often serve as the final classification or regression heads in more complex models like CNNs and RNNs.
nn.Conv2d
)Convolutional layers are the workhorses of modern computer vision models. Unlike linear layers that operate on flattened feature vectors, convolutional layers are designed to process grid-like data, such as images, preserving spatial relationships. The primary layer for 2D data (like images) is nn.Conv2d
.
It works by sliding small filters (kernels) across the input spatial dimensions (height and width). For each position of the filter, it computes a dot product between the filter's weights and the input patch under the filter, producing an element in the output feature map. This process helps detect spatial patterns like edges, corners, and textures.
Key parameters for nn.Conv2d
include:
in_channels
: Number of channels in the input image (e.g., 3 for RGB images).out_channels
: Number of filters to apply. Each filter produces one output channel (feature map).kernel_size
: The dimensions (height, width) of the filters. Can be a single int for square kernels or a tuple (H, W)
.stride
(optional, default 1): How many pixels the filter moves at each step.padding
(optional, default 0): Amount of zero-padding added to the input borders.# Example: Process a batch of 16 images, 3 channels (RGB), 32x32 pixels
# Apply 6 filters (output channels), each 5x5 in size
conv_layer = nn.Conv2d(in_channels=3, out_channels=6, kernel_size=5)
# Create a sample input tensor (batch_size, channels, height, width)
# PyTorch typically expects channels-first format (N, C, H, W)
input_image_batch = torch.randn(16, 3, 32, 32)
# Pass the input through the convolutional layer
output_feature_maps = conv_layer(input_image_batch)
print(f"Input shape: {input_image_batch.shape}")
print(f"Output shape: {output_feature_maps.shape}")
# Without padding/stride, output size reduces: 32 - 5 + 1 = 28
# Expected output:
# Input shape: torch.Size([16, 3, 32, 32])
# Output shape: torch.Size([16, 6, 28, 28])
# Inspect parameters
print(f"\nWeight (filter) shape: {conv_layer.weight.shape}") # (out_channels, in_channels, kernel_H, kernel_W)
print(f"Bias shape: {conv_layer.bias.shape}") # (out_channels)
# Expected output:
# Weight (filter) shape: torch.Size([6, 3, 5, 5])
# Bias shape: torch.Size([6])
nn.Conv2d
and its variants (nn.Conv1d
, nn.Conv3d
) are essential for tasks involving spatial hierarchies, primarily in image and video analysis, but also sometimes applied to sequential data. We will look closer at constructing CNNs in Chapter 7.
nn.RNN
)Recurrent Neural Networks (RNNs) are designed to handle sequential data, where the order of elements matters. Examples include text, time series data, or audio signals. The core idea of an RNN layer is to maintain a hidden state that captures information from previous elements in the sequence, influencing the processing of the current element.
The basic nn.RNN
layer in PyTorch processes an input sequence step by step. At each step t, it takes the input xt and the previous hidden state ht−1 to compute the output ot (optional, often just the final hidden state is used) and the new hidden state ht.
Key parameters for nn.RNN
:
input_size
: The number of features in the input at each time step.hidden_size
: The number of features in the hidden state.num_layers
(optional, default 1): Number of stacked RNN layers.batch_first
(optional, default False): If True, input and output tensors are provided as (batch, seq_len, features)
instead of the default (seq_len, batch, features)
.# Example: Process a batch of 10 sequences, each 20 steps long, with 5 features per step.
# Use a hidden state size of 30.
# Set batch_first=True for easier data handling.
rnn_layer = nn.RNN(input_size=5, hidden_size=30, batch_first=True)
# Create a sample input tensor (batch, seq_len, input_features)
input_sequence_batch = torch.randn(10, 20, 5)
# Initialize the hidden state (num_layers, batch, hidden_size)
# If not provided, it defaults to zeros.
initial_hidden_state = torch.randn(1, 10, 30) # num_layers=1
# Pass the input sequence and initial hidden state through the RNN
# Output contains outputs for all time steps
# Final_hidden_state contains the hidden state for the last time step
output_sequence, final_hidden_state = rnn_layer(input_sequence_batch, initial_hidden_state)
print(f"Input shape: {input_sequence_batch.shape}")
print(f"Initial hidden state shape: {initial_hidden_state.shape}")
print(f"Output sequence shape: {output_sequence.shape}") # (batch, seq_len, hidden_size)
print(f"Final hidden state shape: {final_hidden_state.shape}") # (num_layers, batch, hidden_size)
# Expected output:
# Input shape: torch.Size([10, 20, 5])
# Initial hidden state shape: torch.Size([1, 10, 30])
# Output sequence shape: torch.Size([10, 20, 30])
# Final hidden state shape: torch.Size([1, 10, 30])
While nn.RNN
demonstrates the basic concept, simple RNNs often struggle with long sequences due to vanishing gradients. More advanced recurrent layers like nn.LSTM
(Long Short-Term Memory) and nn.GRU
(Gated Recurrent Unit) are typically preferred in practice as they incorporate gating mechanisms to better manage information flow over long dependencies. These will be mentioned again in Chapter 7.
These three layer types (Linear
, Conv2d
, RNN
) represent fundamental operations for different kinds of data and tasks. torch.nn
provides these and many others (like pooling, normalization, dropout layers) which can be combined within nn.Module
subclasses to create sophisticated deep learning models. In the following sections, we will see how to combine these layers with non-linear activation functions and define criteria for training them using loss functions and optimizers.
© 2025 ApX Machine Learning