Now that you're familiar with how torch.nn.Module serves as the base for all neural network components in PyTorch, and how it compares to Keras layers and model-building APIs, it's time to put this knowledge into practice. This section focuses on translating Keras model architectures directly into their PyTorch equivalents. By working through these examples, you'll solidify your understanding of how to define layers, structure models, and manage data flow within PyTorch's nn.Module framework.We will start with simple models and gradually move to structures that highlight the flexibility of PyTorch's define-by-run approach, which becomes especially apparent when building models that aren't strictly sequential.First, ensure you have PyTorch imported:import torch import torch.nn as nn import torch.nn.functional as F # Often used for activation functionsExample 1: A Simple Feedforward Network (Dense Layers)Let's begin with a basic fully connected neural network, a common starting point in many tutorials.Keras Sequential Model:In Keras, you might define a simple two-layer network for, say, MNIST digit classification (assuming flattened input) like this:# TensorFlow/Keras # import tensorflow as tf # # model_keras_ffn = tf.keras.Sequential([ # tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)), # tf.keras.layers.Dropout(0.2), # tf.keras.layers.Dense(10) # Output layer, activation often handled by loss function # ]) # # model_keras_ffn.summary()The input_shape is defined in the first layer. The final activation (e.g., softmax for classification) is sometimes omitted if the loss function (like tf.keras.losses.CategoricalCrossentropy(from_logits=True)) expects raw logits.PyTorch nn.Module Equivalent:To build this in PyTorch, we'll subclass nn.Module:class PyTorchSimpleFFN(nn.Module): def __init__(self, input_size, hidden_size, num_classes, dropout_rate=0.2): super(PyTorchSimpleFFN, self).__init__() self.fc1 = nn.Linear(input_size, hidden_size) self.relu = nn.ReLU() self.dropout = nn.Dropout(dropout_rate) self.fc2 = nn.Linear(hidden_size, num_classes) def forward(self, x): # x is expected to be of shape (batch_size, input_size) out = self.fc1(x) out = self.relu(out) out = self.dropout(out) out = self.fc2(out) # Note: Softmax is often applied later, e.g., within nn.CrossEntropyLoss return out # Instantiate the model input_size = 784 hidden_size = 128 num_classes = 10 pytorch_model_ffn = PyTorchSimpleFFN(input_size, hidden_size, num_classes) print(pytorch_model_ffn) # Example usage with dummy data: dummy_input_ffn = torch.randn(64, input_size) # Batch of 64, 784 features output_ffn = pytorch_model_ffn(dummy_input_ffn) print("Output shape:", output_ffn.shape) # Expected: torch.Size([64, 10])Observations:Layers in __init__: PyTorch layers (e.g., nn.Linear, nn.ReLU, nn.Dropout) are typically defined as attributes in the __init__ method. nn.Linear is PyTorch's equivalent of Keras's Dense layer.Explicit forward Method: The forward method explicitly defines how input data flows through the layers. This is where PyTorch's dynamic nature shines.Input Shape: Unlike Keras where input_shape can be specified in the first layer, PyTorch models generally adapt to the input shape they receive in the forward method. The nn.Linear layer's in_features must match the feature dimension of the input tensor.Activation Functions: nn.ReLU is an nn.Module, so it's instantiated in __init__. Alternatively, F.relu (from torch.nn.functional) can be applied directly in forward without prior instantiation.Output Layer Activation: Similar to Keras, if you're using a loss function like nn.CrossEntropyLoss in PyTorch (which combines nn.LogSoftmax and nn.NLLLoss), you typically don't apply a softmax activation to the output layer within the model itself. The raw scores (logits) are passed to the loss function.Example 2: A Basic Convolutional Neural Network (CNN)CNNs are fundamental for image processing. Let's translate a simple Keras CNN.Keras Sequential CNN:# TensorFlow/Keras # import tensorflow as tf # # model_keras_cnn = tf.keras.Sequential([ # tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1), padding='same'), # tf.keras.layers.MaxPooling2D(pool_size=(2, 2)), # tf.keras.layers.Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same'), # tf.keras.layers.MaxPooling2D(pool_size=(2, 2)), # tf.keras.layers.Flatten(), # tf.keras.layers.Dense(128, activation='relu'), # tf.keras.layers.Dense(10) # Output logits # ]) # # model_keras_cnn.summary()This Keras model assumes an input of shape (height, width, channels).PyTorch nn.Module Equivalent:PyTorch expects image data in (batch_size, channels, height, width) format.class PyTorchSimpleCNN(nn.Module): def __init__(self, num_classes=10): super(PyTorchSimpleCNN, self).__init__() # Convolutional Layer 1 self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, padding=1) # padding='same' self.relu1 = nn.ReLU() self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2) # Convolutional Layer 2 self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1) # padding='same' self.relu2 = nn.ReLU() self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2) # Flattening and Fully Connected Layers # To determine the input size for fc1, we need to calculate the output shape after conv and pool layers. # Assuming input 28x28: # After conv1 (32, 28, 28), pool1 (32, 14, 14) # After conv2 (64, 14, 14), pool2 (64, 7, 7) # Flattened size: 64 * 7 * 7 self.fc1_input_features = 64 * 7 * 7 self.fc1 = nn.Linear(self.fc1_input_features, 128) self.relu3 = nn.ReLU() self.fc2 = nn.Linear(128, num_classes) def forward(self, x): # x shape: (batch_size, 1, 28, 28) x = self.pool1(self.relu1(self.conv1(x))) x = self.pool2(self.relu2(self.conv2(x))) # Flatten the output for the fully connected layer # x.size(0) is batch_size. -1 infers the rest. x = x.view(x.size(0), -1) # Or use torch.flatten(x, start_dim=1) x = self.relu3(self.fc1(x)) x = self.fc2(x) return x # Instantiate the model pytorch_model_cnn = PyTorchSimpleCNN(num_classes=10) print(pytorch_model_cnn) # Example usage with dummy data: dummy_input_cnn = torch.randn(64, 1, 28, 28) # Batch of 64, 1 channel, 28x28 images output_cnn = pytorch_model_cnn(dummy_input_cnn) print("Output shape:", output_cnn.shape) # Expected: torch.Size([64, 10])Observations:Channel Order: Remember PyTorch's N C H W (Batch, Channels, Height, Width) convention for image tensors, contrasting with TensorFlow's default N H W C.nn.Conv2d: Parameters include in_channels, out_channels, kernel_size, stride, padding. padding=1 for a kernel_size=3 often approximates Keras's padding='same' if stride is 1.Flattening: The transition from convolutional/pooling layers to dense layers requires flattening. In PyTorch, x.view(x.size(0), -1) or torch.flatten(x, start_dim=1) are common. You must calculate the number of features after the last pooling layer to correctly size the first nn.Linear layer. This is a manual step in PyTorch unless using adaptive pooling layers (e.g., nn.AdaptiveAvgPool2d((1,1))) which output a fixed-size feature map regardless of input size, simplifying the subsequent flatten and linear layer definition.nn.MaxPool2d: kernel_size and stride are important parameters.Example 3: A Model with a Skip Connection (Functional API Style)Keras's Functional API allows for more complex architectures, like those with multiple inputs, multiple outputs, or skip connections. PyTorch's nn.Module inherently supports this flexibility through the forward method. Let's build a simple block with a residual (skip) connection.Keras Functional API Model (Illustrative Block):# TensorFlow/Keras # import tensorflow as tf # # input_tensor = tf.keras.Input(shape=(64, 64, 3)) # x = tf.keras.layers.Conv2D(32, (3,3), padding='same', activation='relu')(input_tensor) # x = tf.keras.layers.Conv2D(32, (3,3), padding='same')(x) # No activation yet # # # Example: A simplified residual connection # # For a true ResNet block, channel dimensions might need matching (e.g., with a 1x1 conv) # # Here, we assume input_tensor and x have compatible shapes for addition after the convs # # Or, if channels differ, project input_tensor # identity = tf.keras.layers.Conv2D(32, (1,1), padding='same')(input_tensor) # Project identity # # added = tf.keras.layers.Add()([x, identity]) # output_tensor = tf.keras.layers.Activation('relu')(added) # # model_keras_functional = tf.keras.Model(inputs=input_tensor, outputs=output_tensor) # model_keras_functional.summary()PyTorch nn.Module Equivalent (Simplified Residual Block):We'll create a module that implements a simplified residual block where the input is added back to the output of a couple of convolutional layers. For simplicity, we'll assume the number of channels remains the same, or we'll use a 1x1 convolution for the identity path if dimensions need matching.class PyTorchResidualBlock(nn.Module): def __init__(self, in_channels, out_channels, stride=1): super(PyTorchResidualBlock, self).__init__() self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False) self.bn1 = nn.BatchNorm2d(out_channels) # Batch Normalization is common in ResNets self.relu = nn.ReLU(inplace=True) # inplace=True can save memory self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False) self.bn2 = nn.BatchNorm2d(out_channels) # Shortcut connection (identity or projection) self.shortcut = nn.Sequential() if stride != 1 or in_channels != out_channels: self.shortcut = nn.Sequential( nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(out_channels) ) def forward(self, x): identity = self.shortcut(x) # Apply shortcut transformation to x out = self.conv1(x) out = self.bn1(out) out = self.relu(out) out = self.conv2(out) out = self.bn2(out) out += identity # Element-wise addition out = self.relu(out) return out # Instantiate the block # Example: input with 3 channels, output with 64 channels, downsample res_block = PyTorchResidualBlock(in_channels=3, out_channels=64, stride=2) print(res_block) # Example usage with dummy data: dummy_input_res = torch.randn(16, 3, 224, 224) # Batch of 16, 3 channels, 224x224 images output_res = res_block(dummy_input_res) # If stride=2, H and W will be halved. Output channels will be 64. print("Output shape:", output_res.shape) # Expected: torch.Size([16, 64, 112, 112]) # Example: no change in channels or dimensions res_block_same_dim = PyTorchResidualBlock(in_channels=64, out_channels=64, stride=1) dummy_input_same_dim = torch.randn(16, 64, 56, 56) output_same_dim = res_block_same_dim(dummy_input_same_dim) print("Output shape (same dim):", output_same_dim.shape) # Expected: torch.Size([16, 64, 56, 56])Observations:Arbitrary forward Logic: The forward method can implement any computation, including skip connections, branches, or custom operations. This is where PyTorch's flexibility is most evident. You directly define the data flow.nn.Sequential for Shortcuts: nn.Sequential can be used to group layers, for example, in the shortcut connection if a projection (like a 1x1 convolution) is needed to match dimensions.Parameter Sharing: If you wanted to reuse the same instance of a layer multiple times (with shared weights), you would define it once in __init__ and call it multiple times in forward. If you need different instances of the same type of layer, you define them as separate attributes in __init__.Verifying EquivalenceWhile we've constructed "equivalent" models, true equivalence means they produce the same output for the same input given the same weights.Architecture: The primary check is that the sequence of layers, their types, and their parameters (kernel sizes, number of units/filters, strides, padding) match. Printing the model structure in both PyTorch and Keras (using model.summary()) helps.Parameters Count: The number of trainable parameters should be very similar. Minor differences might arise from how biases are handled by default or specific layer implementations, but major discrepancies indicate an architectural mismatch.Weight Initialization: Default weight initialization schemes can differ between TensorFlow/Keras and PyTorch. To get identical outputs (for testing purposes), you'd need to manually set all weights to be the same. This is usually not necessary for general model translation, as models are trained from random initializations anyway.Forward Pass Test: Feeding a common dummy input (appropriately formatted for each framework) and checking the output shapes is a good sanity check.These hands-on examples demonstrate the core process of translating Keras models to PyTorch. The core is understanding how to define layers within __init__ and then explicitly orchestrate their execution in the forward method. As you've seen, PyTorch's nn.Module provides a very flexible and Pythonic way to define even complex model architectures. The next step is to learn how to feed data into these models and train them.