Okay, you understand the components of a neural network: layers composed of neurons, connected by weights, using activation functions to introduce non-linearity. Now, how do you translate that understanding into actual code that can be trained? This is where deep learning frameworks come into play.
Frameworks like PyTorch and TensorFlow (often used via its higher-level API, Keras) provide the building blocks and automation necessary to construct, train, and evaluate neural networks efficiently. They handle complex operations like gradient calculation (autodiff) and leverage hardware acceleration (like GPUs) behind the scenes, letting you focus on the network's design and the training process. While the concepts are transferable, we'll use PyTorch for our examples here.
torch.nn.Sequential
For many common feedforward networks, the architecture is a simple linear stack of layers: the output of one layer feeds directly into the input of the next. PyTorch offers a convenient container called torch.nn.Sequential
precisely for this purpose. You define the model by defining an ordered sequence of layer modules and passing them to its constructor.
Let's look at the fundamental building blocks you'll use:
torch.nn.Linear
): These are the densely connected layers, also known as fully connected layers in other contexts. They apply a linear transformation to the incoming data: y=xWT+b. When defining a Linear
layer, you must specify the number of input features (in_features
) and the number of output features (out_features
).
import torch
import torch.nn as nn
# Example: A linear layer mapping 784 input features to 128 output features
layer1 = nn.Linear(in_features=784, out_features=128)
torch.nn.ReLU
, torch.nn.Sigmoid
, etc.): These modules apply the non-linear activation functions element-wise to the output of the preceding layer. They typically don't change the shape of the data, so you don't need to specify input/output sizes when defining them as separate layers.
# Example: Applying the ReLU activation function
activation1 = nn.ReLU()
Now, let's combine these into a simple two-layer network using nn.Sequential
. Imagine we have input data with 784 features (like a flattened 28x28 image from the MNIST dataset) and we want to classify it into 10 categories. Our network could have one hidden layer with 128 neurons and a ReLU activation, followed by an output layer with 10 neurons (one for each class).
import torch
import torch.nn as nn
# Define the model architecture as a sequence of layers
model = nn.Sequential(
nn.Linear(in_features=784, out_features=128), # Input layer (784) to Hidden layer (128)
nn.ReLU(), # Activation for hidden layer
nn.Linear(in_features=128, out_features=10) # Hidden layer (128) to Output layer (10)
# Note: Activation for the output layer (e.g., Softmax) is often applied
# separately or integrated into the loss function for numerical stability,
# especially for classification tasks.
)
# Print the model structure to verify
print(model)
Executing this code will print a summary of the model:
Sequential(
(0): Linear(in_features=784, out_features=128, bias=True)
(1): ReLU()
(2): Linear(in_features=128, out_features=10, bias=True)
)
Notice the critical detail: the out_features
of one Linear
layer must match the in_features
of the next Linear
layer (after accounting for any intermediate layers like activations that don't change the feature dimension). PyTorch automatically creates and initializes the weight matrices (W) and bias vectors (b) for each Linear
layer when you define them this way.
Here's a visual representation of this simple network architecture:
A simple feedforward network with one hidden layer, defined using sequential building blocks. Data flows from input features, through a linear transformation, a ReLU activation, another linear transformation, to the final output scores.
torch.nn.Module
The nn.Sequential
approach is excellent for linear stacks, but deep learning architectures can be more complex. You might need skip connections (where the output of an earlier layer is added to the output of a later layer, common in ResNets), multiple inputs or outputs, or layers that share weights across different parts of the network. For these scenarios, PyTorch provides a more flexible and powerful method: defining your model by subclassing the base torch.nn.Module
class.
This involves two primary steps:
__init__
: In the constructor (__init__
) of your custom class, you define all the layers your network will use as class attributes. You instantiate modules like nn.Linear
, nn.ReLU
, nn.Conv2d
(for CNNs), etc., here. This is where the layers (and their parameters) are created. It's important to call super().__init__()
first in your constructor to ensure the base class is properly initialized.forward
: You implement the forward
method, which takes the input tensor(s) as arguments and explicitly defines how data flows through the layers you defined in __init__
. You call the layers on the input data in the desired sequence, allowing for complex routing, branching, or merging of data streams.Let's redefine our previous simple network using this approach for comparison:
import torch
import torch.nn as nn
import torch.nn.functional as F # Often used for stateless functions like activations
class SimpleNet(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
# Call the parent class (nn.Module) constructor
super(SimpleNet, self).__init__()
# Define the layers as attributes of this class
self.layer1 = nn.Linear(input_size, hidden_size)
self.layer2 = nn.Linear(hidden_size, output_size)
# We could also define activation layers here if preferred:
# self.relu = nn.ReLU()
def forward(self, x):
# Define the data flow through the network
# Pass input through the first linear layer
x = self.layer1(x)
# Apply ReLU activation. Here we use the functional version.
x = F.relu(x)
# Pass the result through the second linear layer
x = self.layer2(x)
# Output activation (e.g., softmax) might be applied here or later in the loss function
return x
# Instantiate the model with specific sizes
input_features = 784
hidden_neurons = 128
output_classes = 10
model_custom = SimpleNet(input_features, hidden_neurons, output_classes)
# Print the model structure. It shows the layers defined in __init__
print(model_custom)
Running this code produces a similar summary, listing the defined layers:
SimpleNet(
(layer1): Linear(in_features=784, out_features=128, bias=True)
(layer2): Linear(in_features=128, out_features=10, bias=True)
)
In this forward
method, we used F.relu
from the torch.nn.functional
module. This module provides functional versions of many common operations, including activation functions. These functions are "stateless" meaning they don't have associated parameters (weights or biases) that need to be tracked or updated during training. Using F.relu
is often slightly more concise than defining self.relu = nn.ReLU()
in __init__
and then calling x = self.relu(x)
in forward
. However, both approaches are valid.
While slightly more verbose for simple sequential models, subclassing nn.Module
gives you complete control over the forward pass logic. This makes it indispensable for implementing more advanced and custom network architectures you'll encounter later, such as Convolutional Neural Networks (CNNs) with their specific layer arrangements and Recurrent Neural Networks (RNNs) that involve looping structures.
Choosing between nn.Sequential
and subclassing nn.Module
depends on the complexity of your desired architecture. Start with nn.Sequential
for straightforward feedforward designs, as it's often quicker and easier to read. Move to subclassing nn.Module
when you need more flexibility to define non-linear data flows, custom operations, or shared components within your network. Regardless of the method, accurately defining the layers and ensuring the input/output dimensions align correctly is fundamental to building functional neural network models.
© 2025 ApX Machine Learning