Building upon the idea of shortcut connections introduced in architectures like ResNet, Densely Connected Convolutional Networks (DenseNets) propose an alternative, highly effective connectivity pattern. Instead of combining features through summation before they are passed into a layer, DenseNets combine features by concatenating them. The defining characteristic is that each layer receives the feature maps from all preceding layers within its block as input, and its own output feature maps are used as inputs for all subsequent layers in that block.
The fundamental unit of a DenseNet is the Dense Block. Within a Dense Block containing L layers, the lth layer receives the feature maps of all preceding layers, x0,...,xl−1, as input. Its output xl is computed as:
xl=Hl([x0,x1,...,xl−1])
Here, [x0,x1,...,xl−1] represents the concatenation of the feature maps produced in layers 0,...,l−1. The function Hl(⋅) is a composite function representing the operations within the layer, typically consisting of Batch Normalization (BN), followed by a Rectified Linear Unit (ReLU) activation, and finally a 3x3 Convolution (Conv). This BN-ReLU-Conv ordering is common in DenseNets.
Connectivity within a Dense Block with three layers (L=3). Each layer Hl receives concatenated feature maps [x0,...,xl−1] from all preceding layers. Dashed lines indicate data flow for concatenation without passing through the layer's computation Hl.
A significant hyperparameter in DenseNets is the growth rate, denoted by k. The composite function Hl in each layer produces k output feature maps. Since each layer receives feature maps from all preceding layers, the input to layer l will have k0+(l−1)×k channels, where k0 is the number of channels in the input feature map to the Dense Block.
The growth rate k controls how much new "information" each layer contributes to the collective knowledge within the block. A small growth rate (e.g., k=12 or k=32) makes the network very parameter-efficient, as each layer only adds a few feature maps. This design choice is based on the hypothesis that feature maps from earlier layers are directly accessible by later layers, reducing the need for layers to relearn redundant features. DenseNets can achieve competitive accuracy with substantially fewer parameters than architectures like ResNet, largely due to this feature reuse and small growth rate.
Dense Blocks cannot continue indefinitely, primarily because the number of concatenated feature maps grows linearly with depth (k0+(L−1)×k channels after L layers), which becomes computationally expensive. Furthermore, pooling operations are needed to downsample the spatial dimensions of the feature maps, a standard practice in CNNs.
DenseNets introduce Transition Layers between consecutive Dense Blocks to address these issues. A Transition Layer typically consists of:
High-level structure showing a Transition Layer connecting two Dense Blocks, performing compression and downsampling.
A typical DenseNet architecture starts with an initial convolution and pooling layer, followed by multiple Dense Blocks interleaved with Transition Layers. Finally, a global average pooling layer and a fully connected classifier (often just a softmax layer) produce the final output predictions. The number of layers within each Dense Block can vary depending on the specific DenseNet configuration (e.g., DenseNet-121, DenseNet-169).
DenseNets offer several advantages:
However, there's a significant practical consideration:
While both ResNet and DenseNet utilize shortcut connections to improve information and gradient flow, their mechanism differs fundamentally. ResNet uses element-wise addition (y=F(x)+x) to combine the shortcut and the transformed features. This addition allows identity mappings where layers can be easily skipped if needed. DenseNet uses channel-wise concatenation (xl=Hl([x0,...,xl−1])), forcing layers to aggregate features from all previous layers. This promotes aggressive feature reuse but results in wider layers (in terms of channels) as depth increases within a block. Both approaches have proven highly effective in enabling the training of very deep networks.
In summary, DenseNet introduced a novel connectivity pattern that maximizes information flow between layers by connecting every layer directly with every other layer within a block in a feed-forward fashion. This design encourages feature reuse, leading to compact models that often achieve state-of-the-art results with fewer parameters, although potentially at the cost of higher memory consumption during training.
© 2025 ApX Machine Learning