Having explored the theoretical underpinnings and design principles of modern CNN architectures like ResNet, DenseNet, and EfficientNet, it's time to translate that understanding into practical implementation. This section guides you through building core components of these networks using common deep learning frameworks. While complete, production-ready implementations involve many details, focusing on the fundamental building blocks provides valuable hands-on experience. We assume you have a working Python environment with either PyTorch or TensorFlow/Keras installed.
Ensure you have your preferred deep learning library installed. For instance, using pip:
# For PyTorch
pip install torch torchvision
# For TensorFlow
pip install tensorflow
Access to a GPU is highly recommended for training these models, even on smaller datasets, but the implementation steps can be followed on a CPU.
Recall that the central idea of ResNet is the residual block, which allows the network to learn an identity mapping if needed, easing the training of very deep networks. The core operation is y=F(x)+x, where F(x) represents the residual mapping learned by a few stacked layers, and x is the input to the block (the identity connection).
A common ResNet block consists of two or three convolutional layers, Batch Normalization, and ReLU activation functions. The skip connection adds the input x to the output of the convolutional path F(x).
A basic residual block structure with two convolutional layers. The input
x
is added to the output of the second convolutional layer before the final ReLU activation.
Let's define a BasicBlock
using PyTorch's nn.Module
.
import torch
import torch.nn as nn
import torch.nn.functional as F
class BasicBlock(nn.Module):
expansion = 1 # No expansion of channels for basic block
def __init__(self, in_planes, planes, stride=1):
super(BasicBlock, self).__init__()
self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(planes)
self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(planes)
self.shortcut = nn.Sequential()
# If stride is not 1 or input/output planes differ, project shortcut
if stride != 1 or in_planes != self.expansion*planes:
self.shortcut = nn.Sequential(
nn.Conv2d(in_planes, self.expansion*planes, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(self.expansion*planes)
)
def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
out = self.bn2(self.conv2(out))
out += self.shortcut(x) # The residual connection addition
out = F.relu(out)
return out
# Example Usage:
# Create a block where input planes = 64, output planes = 64, stride = 1
block = BasicBlock(in_planes=64, planes=64, stride=1)
# Create a block changing dimension and resolution
# Input planes = 64, output planes = 128, stride = 2
downsample_block = BasicBlock(in_planes=64, planes=128, stride=2)
# Test with dummy input tensor (Batch, Channels, Height, Width)
dummy_input = torch.randn(4, 64, 32, 32)
output = block(dummy_input)
print("Output shape (same dim):", output.shape)
output_downsampled = downsample_block(dummy_input)
print("Output shape (downsampled):", output_downsampled.shape)
Here's the equivalent using TensorFlow's Keras API.
import tensorflow as tf
from tensorflow.keras import layers
class BasicBlock(layers.Layer):
expansion = 1
def __init__(self, planes, stride=1, **kwargs):
super(BasicBlock, self).__init__(**kwargs)
self.conv1 = layers.Conv2D(planes, kernel_size=3, strides=stride, padding='same', use_bias=False)
self.bn1 = layers.BatchNormalization()
self.relu = layers.ReLU()
self.conv2 = layers.Conv2D(planes, kernel_size=3, strides=1, padding='same', use_bias=False)
self.bn2 = layers.BatchNormalization()
self.shortcut_conv = None
self.shortcut_bn = None
# Store in_planes once built
self.in_planes = None
def build(self, input_shape):
# Determine in_planes from the input shape dynamically
self.in_planes = input_shape[-1]
planes = self.conv1.filters # Get output planes from conv1
# Define shortcut layers if needed (stride!=1 or channels change)
if self.conv1.strides[0] != 1 or self.in_planes != self.expansion * planes:
self.shortcut_conv = layers.Conv2D(self.expansion * planes, kernel_size=1,
strides=self.conv1.strides, use_bias=False)
self.shortcut_bn = layers.BatchNormalization()
super(BasicBlock, self).build(input_shape) # Ensure the base class build is called
def call(self, x, training=None): # `training` argument is important for BatchNorm
identity = x
out = self.conv1(x)
out = self.bn1(out, training=training)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out, training=training)
# Apply shortcut if defined
if self.shortcut_conv is not None:
identity = self.shortcut_conv(x)
identity = self.shortcut_bn(identity, training=training)
out += identity # The residual connection addition
out = self.relu(out)
return out
# Example Usage:
# Create a block where output planes = 64, stride = 1
# Input shape (Batch, Height, Width, Channels) - provide dummy shape for build
dummy_input_shape = (4, 32, 32, 64)
block = BasicBlock(planes=64, stride=1)
block.build(dummy_input_shape) # Explicitly build or pass input data
# Create a block changing dimension and resolution
# Output planes = 128, stride = 2
downsample_block = BasicBlock(planes=128, stride=2)
downsample_block.build(dummy_input_shape)
# Test with dummy input tensor
dummy_input = tf.random.normal((4, 32, 32, 64))
output = block(dummy_input, training=False) # Pass training=False for inference mode BN
print("Output shape (same dim):", output.shape)
output_downsampled = downsample_block(dummy_input, training=False)
print("Output shape (downsampled):", output_downsampled.shape)
These examples illustrate the core logic. Building a full ResNet involves stacking these blocks in stages, typically reducing spatial dimensions and increasing channel depth between stages using blocks with stride=2
.
DenseNet's characteristic feature is its connectivity pattern: each layer receives feature maps from all preceding layers within its block. Instead of adding features like ResNet, DenseNet concatenates them.
A Dense Block concatenates the input with the output of each internal layer. The Transition Layer reduces the number of channels (via 1x1 convolution) and spatial dimensions (via pooling).
Each "BN-ReLU-Conv" unit within the Dense Block typically consists of a 1x1 convolution (bottleneck layer, optional but common) followed by a 3x3 convolution. The number of output channels from the 3x3 convolution is called the growth_rate
(k). Because channels accumulate rapidly, Transition Layers are used between Dense Blocks to compress the feature maps (typically halving the number of channels) and reduce spatial resolution.
Implementing a Dense Block requires careful handling of tensor concatenation along the channel dimension.
torch.cat(tensors, dim=1)
where tensors
is a list of feature maps to concatenate and dim=1
is the channel dimension.tf.keras.layers.Concatenate(axis=-1)
(or axis=3
if using channels-first format).You would define a layer or module for the "BN-ReLU-Conv" unit and then, within the DenseBlock
's forward pass, iteratively apply this unit to the concatenated features from all previous layers within the block. The TransitionLayer
module would contain BatchNorm, a 1x1 Conv2D, and an AvgPool2D layer.
Building a full DenseNet involves creating multiple Dense Blocks separated by Transition Layers, similar to how ResNet stages are constructed.
BasicBlock
or Bottleneck
for ResNet, DenseLayer
and DenseBlock
for DenseNet).MyResNet(nn.Module)
or MyDenseNet(tf.keras.Model)
). This class will:
torch.utils.data.DataLoader
, tf.data.Dataset
).nn.CrossEntropyLoss
, tf.keras.losses.SparseCategoricalCrossentropy
).torch.optim.AdamW
, tf.keras.optimizers.Adam
). Chapter 2 covers advanced optimizers.torchvision.models
and tf.keras.applications
provide pre-implemented and often pre-trained versions of popular architectures. Compare your block implementations and overall structure against these references. They are excellent learning resources.growth_rate
in DenseNet, or the channel dimensions in ResNet. Observe the effect on parameter count, computational cost (e.g., using profiling tools), and potentially training performance (though full training is beyond this section's scope).This hands-on practice solidifies your understanding of how architectural concepts translate into code. While we've focused on the building blocks, remember that successful deep learning also involves careful training, optimization, and data handling, which are the subjects of subsequent chapters. By implementing these foundational structures, you are better prepared to customize existing models or even design novel architectures for specific computer vision tasks.
© 2025 ApX Machine Learning