When transitioning from TensorFlow Keras to PyTorch, understanding how network components are defined is a primary step. In Keras, tf.keras.layers.Layer
serves as the base class for all layers, which are the fundamental building blocks of models. PyTorch offers a similar, yet distinct, foundation with its torch.nn.Module
class. Every neural network module, from a single layer to an entire complex model, is a subclass of nn.Module
. This class provides the core functionality needed for building neural networks: it can register and manage parameters (like weights and biases), hold submodules (other nn.Module
instances), and define the forward computation.
torch.nn.Module
Class: Your Network's BlueprintThink of torch.nn.Module
as the blueprint for any part of your neural network that has learnable parameters or performs a distinct step in the computation. Its main responsibilities include:
nn.Module
can contain torch.nn.Parameter
instances. These are special tensors that are automatically registered as model parameters. When you ask a model for its parameters (e.g., to pass to an optimizer), nn.Module
ensures all such registered parameters, including those in nested submodules, are accessible.nn.Module
can contain other nn.Module
instances as attributes. This allows you to build complex architectures by composing simpler modules in a hierarchical fashion.nn.Module
subclass typically implements a forward()
method. This method takes input tensors and performs the defined operations, returning output tensors. This is directly analogous to the call()
method in a Keras Layer
.While Keras layers have a build()
method often used for deferred weight creation (i.e., creating weights when the input shape is first known), PyTorch modules typically define their layers and thus their parameters' shapes directly in the __init__()
constructor. Weights for pre-built layers like nn.Linear
are created upon instantiation, requiring you to specify input feature dimensions at that time.
__init__
In PyTorch, the standard practice is to define and initialize all constituent layers and parameters of your module within its __init__
constructor. When you assign an instance of another nn.Module
(like nn.Linear
or nn.Conv2d
) as an attribute of your custom module, PyTorch automatically recognizes it as a submodule.
Let's look at a simple example. If you want to create a module that contains a linear layer and a ReLU activation:
import torch
import torch.nn as nn
class SimplePyTorchModule(nn.Module):
def __init__(self, input_features, output_features):
super(SimplePyTorchModule, self).__init__()
# Define submodules (layers) here
self.linear_layer = nn.Linear(input_features, output_features)
self.activation = nn.ReLU()
# Example of a custom learnable parameter
# self.my_custom_bias = nn.Parameter(torch.zeros(output_features))
def forward(self, x):
# Define the computation flow using the submodules
x = self.linear_layer(x)
x = self.activation(x)
# If self.my_custom_bias was defined:
# x = x + self.my_custom_bias
return x
# Instantiate the module
module = SimplePyTorchModule(input_features=10, output_features=5)
print(module)
Output:
SimplePyTorchModule(
(linear_layer): Linear(in_features=10, out_features=5, bias=True)
(activation): ReLU()
)
In this __init__
method:
super(SimplePyTorchModule, self).__init__()
is essential to call the constructor of the base nn.Module
class.self.linear_layer = nn.Linear(input_features, output_features)
creates an instance of PyTorch's pre-built linear layer and assigns it as an attribute. Its parameters (weights and biases) are automatically registered with SimplePyTorchModule
.self.activation = nn.ReLU()
creates an instance of the ReLU activation function. ReLU itself doesn't have learnable parameters, but it's still an nn.Module
.This is similar to how you might define layers within the __init__
or build
method of a custom tf.keras.Layer
subclass. However, the composition into a larger model often differs. In Keras, you might add these to a tf.keras.Sequential
model or connect them using the Functional API. In PyTorch, you are typically building up these nn.Module
structures directly by nesting.
forward()
MethodThe forward()
method is where you define the actual computation your module performs. It takes one or more input tensors and returns one or more output tensors. You use the layers and parameters defined in __init__
to transform the input.
Continuing with SimplePyTorchModule
:
# (inside SimplePyTorchModule class)
# def forward(self, x):
# x = self.linear_layer(x)
# x = self.activation(x)
# return x
Here, self.linear_layer(x)
calls the forward
method of the nn.Linear
instance. nn.Module
instances are callable; when you call module_instance(input)
, it internally calls module_instance.forward(input)
.
A significant aspect of PyTorch is its dynamic computation graph (define-by-run). The forward()
method is just Python code. This means you can use standard Python control flow (loops, conditionals) to define complex, adaptive computations. The graph of operations is built on-the-fly as the forward()
method executes. This contrasts with TensorFlow's traditional graph mode (define-and-run), where the graph is typically defined statically first (though TensorFlow's Eager Execution, default in TF 2.x, behaves more like PyTorch's define-by-run).
torch.nn.Parameter
While pre-built layers like nn.Linear
manage their own parameters, sometimes you need to define your own custom learnable parameters directly within your module. To do this, you wrap a torch.Tensor
with torch.nn.Parameter
. This tells PyTorch that this tensor should be treated as a model parameter, meaning its requires_grad
attribute will be set to True
by default (so gradients will be computed for it during backpropagation), and it will be included in the list returned by model.parameters()
.
import torch
import torch.nn as nn
class ModuleWithCustomParameter(nn.Module):
def __init__(self, num_features):
super(ModuleWithCustomParameter, self).__init__()
# A learnable scaling factor for each feature
self.scale = nn.Parameter(torch.ones(num_features))
# A learnable bias for each feature
self.bias = nn.Parameter(torch.zeros(num_features))
def forward(self, x):
# x is expected to have shape (batch_size, num_features)
return x * self.scale + self.bias
custom_param_module = ModuleWithCustomParameter(5)
# You can inspect its parameters:
for name, param in custom_param_module.named_parameters():
print(f"{name}: {param.data}")
This is analogous to using self.add_weight()
in a Keras Layer
's build
method to create and register trainable weights.
One of the powerful features of nn.Module
is its ability to contain other nn.Module
instances. This allows you to build complex models by composing simpler, reusable blocks in a hierarchical manner. A parent module automatically discovers and manages all parameters from its nested child modules.
Consider building a slightly more complex network that uses the SimplePyTorchModule
we defined earlier as a building block:
class AdvancedNetwork(nn.Module):
def __init__(self, input_dim, hidden_dim1, hidden_dim2, output_dim):
super(AdvancedNetwork, self).__init__()
# Nesting SimplePyTorchModule as a "block"
self.block1 = SimplePyTorchModule(input_dim, hidden_dim1)
self.intermediate_linear = nn.Linear(hidden_dim1, hidden_dim2)
self.relu = nn.ReLU()
self.output_layer = nn.Linear(hidden_dim2, output_dim)
def forward(self, x):
x = self.block1(x) # Uses the forward pass of SimplePyTorchModule
x = self.intermediate_linear(x)
x = self.relu(x)
x = self.output_layer(x)
return x
# Example instantiation
adv_net = AdvancedNetwork(input_dim=20, hidden_dim1=15, hidden_dim2=10, output_dim=2)
print(adv_net)
Output:
AdvancedNetwork(
(block1): SimplePyTorchModule(
(linear_layer): Linear(in_features=20, out_features=15, bias=True)
(activation): ReLU()
)
(intermediate_linear): Linear(in_features=15, out_features=10, bias=True)
(relu): ReLU()
(output_layer): Linear(in_features=10, out_features=2, bias=True)
)
As you can see, AdvancedNetwork
contains block1
, which is an instance of SimplePyTorchModule
. All parameters from block1.linear_layer
, intermediate_linear
, and output_layer
will be part of adv_net.parameters()
. This composability is central to organizing PyTorch code effectively.
The following diagram illustrates the structure of a custom PyTorch nn.Module
that contains other modules (both pre-built and potentially other custom ones) and parameters.
A PyTorch
nn.Module
(MyNetwork
) defining its components (submodules likeCustomBlockA
,nn.ReLU
,nn.Linear
, and directnn.Parameter
s likeglobal_bias
) in__init__
, and their computational flow inforward
.CustomBlockA
itself is anothernn.Module
, demonstrating nesting.
torch.nn
PyTorch comes with a rich set of pre-built layers in the torch.nn
package, such as:
nn.Linear
: Fully connected layer.nn.Conv1d
, nn.Conv2d
, nn.Conv3d
: Convolutional layers for different dimensionalities.nn.RNN
, nn.LSTM
, nn.GRU
: Recurrent layers.nn.BatchNorm1d
, nn.BatchNorm2d
: Batch normalization layers.nn.Dropout
: Dropout layer.nn.ReLU
, nn.Sigmoid
, nn.Tanh
, nn.Softmax
(though many activations are also available in torch.nn.functional
and can be applied directly in the forward
method).These pre-built layers are all subclasses of nn.Module
themselves. You use them by instantiating them in your module's __init__
method and then calling them in the forward
method with the appropriate input. This is analogous to how you would use layers from tf.keras.layers
in TensorFlow. Subsequent sections will cover these common layer types in more detail, comparing their PyTorch implementations to their Keras counterparts.
The shift from Keras's Layer
to PyTorch's nn.Module
involves embracing a structure where you explicitly define the components of your network modules in __init__
and their computational flow in forward
. This explicit definition offers fine-grained control and integrates naturally with Python's dynamic capabilities, making it straightforward to build and debug even very complex model architectures.
Was this section helpful?
© 2025 ApX Machine Learning