All Courses

Defining Network Components: Keras Layers and torch.nn.Module

When transitioning from TensorFlow Keras to PyTorch, understanding how network components are defined is a primary step. In Keras, tf.keras.layers.Layer serves as the base class for all layers, which are the fundamental building blocks of models. PyTorch offers a similar, yet distinct, foundation with its torch.nn.Module class. Every neural network module, from a single layer to an entire complex model, is a subclass of nn.Module. This class provides the core functionality needed for building neural networks: it can register and manage parameters (like weights and biases), hold submodules (other nn.Module instances), and define the forward computation.

The `torch.nn.Module` Class: Your Network's Blueprint

Think of torch.nn.Module as the blueprint for any part of your neural network that has learnable parameters or performs a distinct step in the computation. Its main responsibilities include:

Parameter Management: nn.Module can contain torch.nn.Parameter instances. These are special tensors that are automatically registered as model parameters. When you ask a model for its parameters (e.g., to pass to an optimizer), nn.Module ensures all such registered parameters, including those in nested submodules, are accessible.
Submodule Organization: An nn.Module can contain other nn.Module instances as attributes. This allows you to build complex architectures by composing simpler modules in a hierarchical fashion.
Defining Computation: Every nn.Module subclass typically implements a forward() method. This method takes input tensors and performs the defined operations, returning output tensors. This is directly analogous to the call() method in a Keras Layer.

While Keras layers have a build() method often used for deferred weight creation (i.e., creating weights when the input shape is first known), PyTorch modules typically define their layers and thus their parameters' shapes directly in the __init__() constructor. Weights for pre-built layers like nn.Linear are created upon instantiation, requiring you to specify input feature dimensions at that time.

Defining Layers and Parameters in `init`

In PyTorch, the standard practice is to define and initialize all constituent layers and parameters of your module within its __init__ constructor. When you assign an instance of another nn.Module (like nn.Linear or nn.Conv2d) as an attribute of your custom module, PyTorch automatically recognizes it as a submodule.

Let's look at a simple example. If you want to create a module that contains a linear layer and a ReLU activation:

import torch
import torch.nn as nn

class SimplePyTorchModule(nn.Module):
    def __init__(self, input_features, output_features):
        super(SimplePyTorchModule, self).__init__()
        # Define submodules (layers) here
        self.linear_layer = nn.Linear(input_features, output_features)
        self.activation = nn.ReLU()
        
        # Example of a custom learnable parameter
        # self.my_custom_bias = nn.Parameter(torch.zeros(output_features))

    def forward(self, x):
        # Define the computation flow using the submodules
        x = self.linear_layer(x)
        x = self.activation(x)
        # If self.my_custom_bias was defined:
        # x = x + self.my_custom_bias 
        return x

# Instantiate the module
module = SimplePyTorchModule(input_features=10, output_features=5)
print(module)

Output:

SimplePyTorchModule(
  (linear_layer): Linear(in_features=10, out_features=5, bias=True)
  (activation): ReLU()
)

In this __init__ method:

super(SimplePyTorchModule, self).__init__() is essential to call the constructor of the base nn.Module class.
self.linear_layer = nn.Linear(input_features, output_features) creates an instance of PyTorch's pre-built linear layer and assigns it as an attribute. Its parameters (weights and biases) are automatically registered with SimplePyTorchModule.
self.activation = nn.ReLU() creates an instance of the ReLU activation function. ReLU itself doesn't have learnable parameters, but it's still an nn.Module.

This is similar to how you might define layers within the __init__ or build method of a custom tf.keras.Layer subclass. However, the composition into a larger model often differs. In Keras, you might add these to a tf.keras.Sequential model or connect them using the Functional API. In PyTorch, you are typically building up these nn.Module structures directly by nesting.

Implementing the `forward()` Method

The forward() method is where you define the actual computation your module performs. It takes one or more input tensors and returns one or more output tensors. You use the layers and parameters defined in __init__ to transform the input.

Continuing with SimplePyTorchModule:

# (inside SimplePyTorchModule class)
# def forward(self, x):
#     x = self.linear_layer(x)
#     x = self.activation(x)
#     return x

Here, self.linear_layer(x) calls the forward method of the nn.Linear instance. nn.Module instances are callable; when you call module_instance(input), it internally calls module_instance.forward(input).

A significant aspect of PyTorch is its dynamic computation graph (define-by-run). The forward() method is just Python code. This means you can use standard Python control flow (loops, conditionals) to define complex, adaptive computations. The graph of operations is built on-the-fly as the forward() method executes. This contrasts with TensorFlow's traditional graph mode (define-and-run), where the graph is typically defined statically first (though TensorFlow's Eager Execution, default in TF 2.x, behaves more like PyTorch's define-by-run).

Learnable Parameters: `torch.nn.Parameter`

While pre-built layers like nn.Linear manage their own parameters, sometimes you need to define your own custom learnable parameters directly within your module. To do this, you wrap a torch.Tensor with torch.nn.Parameter. This tells PyTorch that this tensor should be treated as a model parameter, meaning its requires_grad attribute will be set to True by default (so gradients will be computed for it during backpropagation), and it will be included in the list returned by model.parameters().

import torch
import torch.nn as nn

class ModuleWithCustomParameter(nn.Module):
    def __init__(self, num_features):
        super(ModuleWithCustomParameter, self).__init__()
        # A learnable scaling factor for each feature
        self.scale = nn.Parameter(torch.ones(num_features)) 
        # A learnable bias for each feature
        self.bias = nn.Parameter(torch.zeros(num_features))

    def forward(self, x):
        # x is expected to have shape (batch_size, num_features)
        return x * self.scale + self.bias

custom_param_module = ModuleWithCustomParameter(5)
# You can inspect its parameters:
for name, param in custom_param_module.named_parameters():
    print(f"{name}: {param.data}")

This is analogous to using self.add_weight() in a Keras Layer's build method to create and register trainable weights.

Nesting Modules for Hierarchical Architectures

One of the powerful features of nn.Module is its ability to contain other nn.Module instances. This allows you to build complex models by composing simpler, reusable blocks in a hierarchical manner. A parent module automatically discovers and manages all parameters from its nested child modules.

Consider building a slightly more complex network that uses the SimplePyTorchModule we defined earlier as a building block:

class AdvancedNetwork(nn.Module):
    def __init__(self, input_dim, hidden_dim1, hidden_dim2, output_dim):
        super(AdvancedNetwork, self).__init__()
        # Nesting SimplePyTorchModule as a "block"
        self.block1 = SimplePyTorchModule(input_dim, hidden_dim1)
        self.intermediate_linear = nn.Linear(hidden_dim1, hidden_dim2)
        self.relu = nn.ReLU()
        self.output_layer = nn.Linear(hidden_dim2, output_dim)

    def forward(self, x):
        x = self.block1(x) # Uses the forward pass of SimplePyTorchModule
        x = self.intermediate_linear(x)
        x = self.relu(x)
        x = self.output_layer(x)
        return x

# Example instantiation
adv_net = AdvancedNetwork(input_dim=20, hidden_dim1=15, hidden_dim2=10, output_dim=2)
print(adv_net)

Output:

AdvancedNetwork(
  (block1): SimplePyTorchModule(
    (linear_layer): Linear(in_features=20, out_features=15, bias=True)
    (activation): ReLU()
  )
  (intermediate_linear): Linear(in_features=15, out_features=10, bias=True)
  (relu): ReLU()
  (output_layer): Linear(in_features=10, out_features=2, bias=True)
)

As you can see, AdvancedNetwork contains block1, which is an instance of SimplePyTorchModule. All parameters from block1.linear_layer, intermediate_linear, and output_layer will be part of adv_net.parameters(). This composability is central to organizing PyTorch code effectively.

The following diagram illustrates the structure of a custom PyTorch nn.Module that contains other modules (both pre-built and potentially other custom ones) and parameters.

A PyTorch nn.Module (MyNetwork) defining its components (submodules like CustomBlockA, nn.ReLU, nn.Linear, and direct nn.Parameters like global_bias) in __init__, and their computational flow in forward. CustomBlockA itself is another nn.Module, demonstrating nesting.

Pre-built Layers in `torch.nn`

PyTorch comes with a rich set of pre-built layers in the torch.nn package, such as:

nn.Linear: Fully connected layer.
nn.Conv1d, nn.Conv2d, nn.Conv3d: Convolutional layers for different dimensionalities.
nn.RNN, nn.LSTM, nn.GRU: Recurrent layers.
nn.BatchNorm1d, nn.BatchNorm2d: Batch normalization layers.
nn.Dropout: Dropout layer.
Activation functions like nn.ReLU, nn.Sigmoid, nn.Tanh, nn.Softmax (though many activations are also available in torch.nn.functional and can be applied directly in the forward method).

These pre-built layers are all subclasses of nn.Module themselves. You use them by instantiating them in your module's __init__ method and then calling them in the forward method with the appropriate input. This is analogous to how you would use layers from tf.keras.layers in TensorFlow. Subsequent sections will cover these common layer types in more detail, comparing their PyTorch implementations to their Keras counterparts.

The shift from Keras's Layer to PyTorch's nn.Module involves embracing a structure where you explicitly define the components of your network modules in __init__ and their computational flow in forward. This explicit definition offers fine-grained control and integrates naturally with Python's dynamic capabilities, making it straightforward to build and debug even very complex model architectures.

Was this section helpful?

Defining Network Components: Keras Layers and torch.nn.Module

The torch.nn.Module Class: Your Network's Blueprint

Defining Layers and Parameters in __init__

Implementing the forward() Method

Learnable Parameters: torch.nn.Parameter