Neural networks are built from fundamental components: layers of neurons, connected by adjustable weights, and incorporating activation functions for non-linearity. Translating these network concepts into executable, trainable code requires powerful tools. This is where deep learning frameworks become essential.
Frameworks like PyTorch and TensorFlow (often used via its higher-level API, Keras) provide the building blocks and automation necessary to construct, train, and evaluate neural networks efficiently. They handle complex operations like gradient calculation (autodiff) and leverage hardware acceleration (like GPUs) behind the scenes, letting you focus on the network's design and the training process. While the concepts are transferable, we'll use PyTorch for our examples here.
torch.nn.SequentialFor many common feedforward networks, the architecture is a simple linear stack of layers: the output of one layer feeds directly into the input of the next. PyTorch offers a convenient container called torch.nn.Sequential precisely for this purpose. You define the model by defining an ordered sequence of layer modules and passing them to its constructor.
Let's look at the fundamental building blocks you'll use:
torch.nn.Linear): These are the densely connected layers, also known as fully connected layers in other contexts. They apply a linear transformation to the incoming data: y=xWT+b. When defining a Linear layer, you must specify the number of input features (in_features) and the number of output features (out_features).
import torch
import torch.nn as nn
# Example: A linear layer mapping 784 input features to 128 output features
layer1 = nn.Linear(in_features=784, out_features=128)
torch.nn.ReLU, torch.nn.Sigmoid, etc.): These modules apply the non-linear activation functions element-wise to the output of the preceding layer. They typically don't change the shape of the data, so you don't need to specify input/output sizes when defining them as separate layers.
# Example: Applying the ReLU activation function
activation1 = nn.ReLU()
Now, let's combine these into a simple two-layer network using nn.Sequential. Imagine we have input data with 784 features (like a flattened 28x28 image from the MNIST dataset) and we want to classify it into 10 categories. Our network could have one hidden layer with 128 neurons and a ReLU activation, followed by an output layer with 10 neurons (one for each class).
import torch
import torch.nn as nn
# Define the model architecture as a sequence of layers
model = nn.Sequential(
nn.Linear(in_features=784, out_features=128), # Input layer (784) to Hidden layer (128)
nn.ReLU(), # Activation for hidden layer
nn.Linear(in_features=128, out_features=10) # Hidden layer (128) to Output layer (10)
# Note: Activation for the output layer (e.g., Softmax) is often applied
# separately or integrated into the loss function for numerical stability,
# especially for classification tasks.
)
# Print the model structure to verify
print(model)
Executing this code will print a summary of the model:
Sequential(
(0): Linear(in_features=784, out_features=128, bias=True)
(1): ReLU()
(2): Linear(in_features=128, out_features=10, bias=True)
)
Notice the critical detail: the out_features of one Linear layer must match the in_features of the next Linear layer (after accounting for any intermediate layers like activations that don't change the feature dimension). PyTorch automatically creates and initializes the weight matrices (W) and bias vectors (b) for each Linear layer when you define them this way.
Here's a visual representation of this simple network architecture:
A simple feedforward network with one hidden layer, defined using sequential building blocks. Data flows from input features, through a linear transformation, a ReLU activation, another linear transformation, to the final output scores.
torch.nn.ModuleThe nn.Sequential approach is excellent for linear stacks, but deep learning architectures can be more complex. You might need skip connections (where the output of an earlier layer is added to the output of a later layer, common in ResNets), multiple inputs or outputs, or layers that share weights across different parts of the network. For these scenarios, PyTorch provides a more flexible and powerful method: defining your model by subclassing the base torch.nn.Module class.
This involves two primary steps:
__init__: In the constructor (__init__) of your custom class, you define all the layers your network will use as class attributes. You instantiate modules like nn.Linear, nn.ReLU, nn.Conv2d (for CNNs), etc., here. This is where the layers (and their parameters) are created. It's important to call super().__init__() first in your constructor to ensure the base class is properly initialized.forward: You implement the forward method, which takes the input tensor(s) as arguments and explicitly defines how data flows through the layers you defined in __init__. You call the layers on the input data in the desired sequence, allowing for complex routing, branching, or merging of data streams.Let's redefine our previous simple network using this approach for comparison:
import torch
import torch.nn as nn
import torch.nn.functional as F # Often used for stateless functions like activations
class SimpleNet(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
# Call the parent class (nn.Module) constructor
super(SimpleNet, self).__init__()
# Define the layers as attributes of this class
self.layer1 = nn.Linear(input_size, hidden_size)
self.layer2 = nn.Linear(hidden_size, output_size)
# We could also define activation layers here if preferred:
# self.relu = nn.ReLU()
def forward(self, x):
# Define the data flow through the network
# Pass input through the first linear layer
x = self.layer1(x)
# Apply ReLU activation. Here we use the functional version.
x = F.relu(x)
# Pass the result through the second linear layer
x = self.layer2(x)
# Output activation (e.g., softmax) might be applied here or later in the loss function
return x
# Instantiate the model with specific sizes
input_features = 784
hidden_neurons = 128
output_classes = 10
model_custom = SimpleNet(input_features, hidden_neurons, output_classes)
# Print the model structure. It shows the layers defined in __init__
print(model_custom)
Running this code produces a similar summary, listing the defined layers:
SimpleNet(
(layer1): Linear(in_features=784, out_features=128, bias=True)
(layer2): Linear(in_features=128, out_features=10, bias=True)
)
In this forward method, we used F.relu from the torch.nn.functional module. This module provides functional versions of many common operations, including activation functions. These functions are "stateless" meaning they don't have associated parameters (weights or biases) that need to be tracked or updated during training. Using F.relu is often slightly more concise than defining self.relu = nn.ReLU() in __init__ and then calling x = self.relu(x) in forward. However, both approaches are valid.
While slightly more verbose for simple sequential models, subclassing nn.Module gives you complete control over the forward pass logic. This makes it indispensable for implementing more advanced and custom network architectures you'll encounter later, such as Convolutional Neural Networks (CNNs) with their specific layer arrangements and Recurrent Neural Networks (RNNs) that involve looping structures.
Choosing between nn.Sequential and subclassing nn.Module depends on the complexity of your desired architecture. Start with nn.Sequential for straightforward feedforward designs, as it's often quicker and easier to read. Move to subclassing nn.Module when you need more flexibility to define non-linear data flows, custom operations, or shared components within your network. Regardless of the method, accurately defining the layers and ensuring the input/output dimensions align correctly is fundamental to building functional neural network models.
Was this section helpful?
nn.Sequential and nn.Module approaches.© 2026 ApX Machine LearningEngineered with