Now that you're familiar with how torch.nn.Module
serves as the base for all neural network components in PyTorch, and how it compares to Keras layers and model-building APIs, it's time to put this knowledge into practice. This section focuses on translating Keras model architectures directly into their PyTorch equivalents. By working through these examples, you'll solidify your understanding of how to define layers, structure models, and manage data flow within PyTorch's nn.Module
framework.
We will start with simple models and gradually move to structures that highlight the flexibility of PyTorch's define-by-run approach, which becomes especially apparent when building models that aren't strictly sequential.
First, ensure you have PyTorch imported:
import torch
import torch.nn as nn
import torch.nn.functional as F # Often used for activation functions
Let's begin with a basic fully connected neural network, a common starting point in many tutorials.
Keras Sequential Model:
In Keras, you might define a simple two-layer network for, say, MNIST digit classification (assuming flattened input) like this:
# TensorFlow/Keras
# import tensorflow as tf
#
# model_keras_ffn = tf.keras.Sequential([
# tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)),
# tf.keras.layers.Dropout(0.2),
# tf.keras.layers.Dense(10) # Output layer, activation often handled by loss function
# ])
#
# model_keras_ffn.summary()
The input_shape
is defined in the first layer. The final activation (e.g., softmax for classification) is sometimes omitted if the loss function (like tf.keras.losses.CategoricalCrossentropy(from_logits=True)
) expects raw logits.
PyTorch nn.Module
Equivalent:
To build this in PyTorch, we'll subclass nn.Module
:
class PyTorchSimpleFFN(nn.Module):
def __init__(self, input_size, hidden_size, num_classes, dropout_rate=0.2):
super(PyTorchSimpleFFN, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size)
self.relu = nn.ReLU()
self.dropout = nn.Dropout(dropout_rate)
self.fc2 = nn.Linear(hidden_size, num_classes)
def forward(self, x):
# x is expected to be of shape (batch_size, input_size)
out = self.fc1(x)
out = self.relu(out)
out = self.dropout(out)
out = self.fc2(out)
# Note: Softmax is often applied later, e.g., within nn.CrossEntropyLoss
return out
# Instantiate the model
input_size = 784
hidden_size = 128
num_classes = 10
pytorch_model_ffn = PyTorchSimpleFFN(input_size, hidden_size, num_classes)
print(pytorch_model_ffn)
# Example usage with dummy data:
dummy_input_ffn = torch.randn(64, input_size) # Batch of 64, 784 features
output_ffn = pytorch_model_ffn(dummy_input_ffn)
print("Output shape:", output_ffn.shape) # Expected: torch.Size([64, 10])
Observations:
__init__
: PyTorch layers (e.g., nn.Linear
, nn.ReLU
, nn.Dropout
) are typically defined as attributes in the __init__
method. nn.Linear
is PyTorch's equivalent of Keras's Dense
layer.forward
Method: The forward
method explicitly defines how input data flows through the layers. This is where PyTorch's dynamic nature shines.input_shape
can be specified in the first layer, PyTorch models generally adapt to the input shape they receive in the forward
method. The nn.Linear
layer's in_features
must match the feature dimension of the input tensor.nn.ReLU
is an nn.Module
, so it's instantiated in __init__
. Alternatively, F.relu
(from torch.nn.functional
) can be applied directly in forward
without prior instantiation.nn.CrossEntropyLoss
in PyTorch (which combines nn.LogSoftmax
and nn.NLLLoss
), you typically don't apply a softmax activation to the output layer within the model itself. The raw scores (logits) are passed to the loss function.CNNs are fundamental for image processing. Let's translate a simple Keras CNN.
Keras Sequential CNN:
# TensorFlow/Keras
# import tensorflow as tf
#
# model_keras_cnn = tf.keras.Sequential([
# tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1), padding='same'),
# tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
# tf.keras.layers.Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same'),
# tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
# tf.keras.layers.Flatten(),
# tf.keras.layers.Dense(128, activation='relu'),
# tf.keras.layers.Dense(10) # Output logits
# ])
#
# model_keras_cnn.summary()
This Keras model assumes an input of shape (height, width, channels)
.
PyTorch nn.Module
Equivalent:
PyTorch expects image data in (batch_size, channels, height, width)
format.
class PyTorchSimpleCNN(nn.Module):
def __init__(self, num_classes=10):
super(PyTorchSimpleCNN, self).__init__()
# Convolutional Layer 1
self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, padding=1) # padding='same'
self.relu1 = nn.ReLU()
self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
# Convolutional Layer 2
self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1) # padding='same'
self.relu2 = nn.ReLU()
self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
# Flattening and Fully Connected Layers
# To determine the input size for fc1, we need to calculate the output shape after conv and pool layers.
# Assuming input 28x28:
# After conv1 (32, 28, 28), pool1 (32, 14, 14)
# After conv2 (64, 14, 14), pool2 (64, 7, 7)
# Flattened size: 64 * 7 * 7
self.fc1_input_features = 64 * 7 * 7
self.fc1 = nn.Linear(self.fc1_input_features, 128)
self.relu3 = nn.ReLU()
self.fc2 = nn.Linear(128, num_classes)
def forward(self, x):
# x shape: (batch_size, 1, 28, 28)
x = self.pool1(self.relu1(self.conv1(x)))
x = self.pool2(self.relu2(self.conv2(x)))
# Flatten the output for the fully connected layer
# x.size(0) is batch_size. -1 infers the rest.
x = x.view(x.size(0), -1) # Or use torch.flatten(x, start_dim=1)
x = self.relu3(self.fc1(x))
x = self.fc2(x)
return x
# Instantiate the model
pytorch_model_cnn = PyTorchSimpleCNN(num_classes=10)
print(pytorch_model_cnn)
# Example usage with dummy data:
dummy_input_cnn = torch.randn(64, 1, 28, 28) # Batch of 64, 1 channel, 28x28 images
output_cnn = pytorch_model_cnn(dummy_input_cnn)
print("Output shape:", output_cnn.shape) # Expected: torch.Size([64, 10])
Observations:
N C H W
(Batch, Channels, Height, Width) convention for image tensors, contrasting with TensorFlow's default N H W C
.nn.Conv2d
: Parameters include in_channels
, out_channels
, kernel_size
, stride
, padding
. padding=1
for a kernel_size=3
often approximates Keras's padding='same'
if stride is 1.x.view(x.size(0), -1)
or torch.flatten(x, start_dim=1)
are common. You must calculate the number of features after the last pooling layer to correctly size the first nn.Linear
layer. This is a manual step in PyTorch unless using adaptive pooling layers (e.g., nn.AdaptiveAvgPool2d((1,1))
) which output a fixed-size feature map regardless of input size, simplifying the subsequent flatten and linear layer definition.nn.MaxPool2d
: kernel_size
and stride
are key parameters.Keras's Functional API allows for more complex architectures, like those with multiple inputs, multiple outputs, or skip connections. PyTorch's nn.Module
inherently supports this flexibility through the forward
method. Let's build a simple block with a residual (skip) connection.
Keras Functional API Model (Illustrative Block):
# TensorFlow/Keras
# import tensorflow as tf
#
# input_tensor = tf.keras.Input(shape=(64, 64, 3))
# x = tf.keras.layers.Conv2D(32, (3,3), padding='same', activation='relu')(input_tensor)
# x = tf.keras.layers.Conv2D(32, (3,3), padding='same')(x) # No activation yet
#
# # Example: A simplified residual connection
# # For a true ResNet block, channel dimensions might need matching (e.g., with a 1x1 conv)
# # Here, we assume input_tensor and x have compatible shapes for addition after the convs
# # Or, if channels differ, project input_tensor
# identity = tf.keras.layers.Conv2D(32, (1,1), padding='same')(input_tensor) # Project identity
#
# added = tf.keras.layers.Add()([x, identity])
# output_tensor = tf.keras.layers.Activation('relu')(added)
#
# model_keras_functional = tf.keras.Model(inputs=input_tensor, outputs=output_tensor)
# model_keras_functional.summary()
PyTorch nn.Module
Equivalent (Simplified Residual Block):
We'll create a module that implements a simplified residual block where the input is added back to the output of a couple of convolutional layers. For simplicity, we'll assume the number of channels remains the same, or we'll use a 1x1 convolution for the identity path if dimensions need matching.
class PyTorchResidualBlock(nn.Module):
def __init__(self, in_channels, out_channels, stride=1):
super(PyTorchResidualBlock, self).__init__()
self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(out_channels) # Batch Normalization is common in ResNets
self.relu = nn.ReLU(inplace=True) # inplace=True can save memory
self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(out_channels)
# Shortcut connection (identity or projection)
self.shortcut = nn.Sequential()
if stride != 1 or in_channels != out_channels:
self.shortcut = nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(out_channels)
)
def forward(self, x):
identity = self.shortcut(x) # Apply shortcut transformation to x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out += identity # Element-wise addition
out = self.relu(out)
return out
# Instantiate the block
# Example: input with 3 channels, output with 64 channels, downsample
res_block = PyTorchResidualBlock(in_channels=3, out_channels=64, stride=2)
print(res_block)
# Example usage with dummy data:
dummy_input_res = torch.randn(16, 3, 224, 224) # Batch of 16, 3 channels, 224x224 images
output_res = res_block(dummy_input_res)
# If stride=2, H and W will be halved. Output channels will be 64.
print("Output shape:", output_res.shape) # Expected: torch.Size([16, 64, 112, 112])
# Example: no change in channels or dimensions
res_block_same_dim = PyTorchResidualBlock(in_channels=64, out_channels=64, stride=1)
dummy_input_same_dim = torch.randn(16, 64, 56, 56)
output_same_dim = res_block_same_dim(dummy_input_same_dim)
print("Output shape (same dim):", output_same_dim.shape) # Expected: torch.Size([16, 64, 56, 56])
Observations:
forward
Logic: The forward
method can implement any computation, including skip connections, branches, or custom operations. This is where PyTorch's flexibility is most evident. You directly define the data flow.nn.Sequential
for Shortcuts: nn.Sequential
can be used to group layers, for example, in the shortcut connection if a projection (like a 1x1 convolution) is needed to match dimensions.__init__
and call it multiple times in forward
. If you need different instances of the same type of layer, you define them as separate attributes in __init__
.While we've constructed "equivalent" models, true equivalence means they produce the same output for the same input given the same weights.
model.summary()
) helps.These hands-on examples demonstrate the core process of translating Keras models to PyTorch. The key is understanding how to define layers within __init__
and then explicitly orchestrate their execution in the forward
method. As you've seen, PyTorch's nn.Module
provides a very flexible and Pythonic way to define even complex model architectures. The next step is to learn how to feed data into these models and train them.
© 2025 ApX Machine Learning