After applying convolutional layers, the resulting feature maps capture spatial hierarchies of patterns detected in the input. However, these feature maps can still be quite large, leading to high computational costs in subsequent layers. Furthermore, we often want our network to be somewhat robust to the exact position of a feature in the input; whether an edge is detected a few pixels to the left or right shouldn't drastically change the overall classification. This is where pooling layers come in.
Pooling layers perform a downsampling operation along the spatial dimensions (width and height) of the feature maps. They reduce the amount of information, keeping only the most salient parts, which helps to decrease computation, control overfitting, and introduce a degree of translation invariance. Unlike convolutional layers, standard pooling layers do not have learnable parameters; they apply a fixed operation.
The most common type of pooling is Max Pooling. It works by defining a window (or pool) size, typically 2x2, and a stride, often matching the pool size. This window slides over the input feature map. For each position of the window, the maximum value within that window is selected and becomes the corresponding element in the output feature map.
Consider a 4x4 feature map and a MaxPooling operation with a pool size of 2x2 and a stride of 2.
The result is a new 2x2 feature map where each value represents the maximum activation from a 2x2 region in the original 4x4 map.
Input Feature Map (4x4) Window Positions & Max Operation Output Feature Map (2x2)
[[ 1 3 | 2 4 ] max(1,3,5,7)=7 max(2,4,6,8)=8 [[ 7 8 ]
[ 5 7 | 6 8 ] -------------------------- [ ]
[-------+-------] max(9,1,3,4)=9 max(2,5,6,0)=6 [ 9 6 ]]
[ 9 1 | 2 5 ]
[ 3 4 | 6 0 ]]
MaxPooling2D
Keras provides the keras.layers.MaxPooling2D
layer for this operation. It's typically added after a convolutional layer (often paired with an activation function).
import keras
from keras import layers
# Example Usage within a Sequential model
model = keras.Sequential([
# Assuming input shape is (height, width, channels) e.g., (64, 64, 3)
layers.Input(shape=(64, 64, 3)),
# First Convolutional Block
layers.Conv2D(filters=32, kernel_size=(3, 3), activation='relu'),
# Apply MaxPooling
layers.MaxPooling2D(pool_size=(2, 2)), # Reduces spatial dimensions by factor of 2
# Second Convolutional Block
layers.Conv2D(filters=64, kernel_size=(3, 3), activation='relu'),
layers.MaxPooling2D(pool_size=(2, 2)), # Reduces again
# ... potentially more layers ...
layers.Flatten(),
layers.Dense(10, activation='softmax') # Example output layer
])
model.summary()
The primary arguments for MaxPooling2D
are:
pool_size
: A tuple specifying the height and width of the pooling window (e.g., (2, 2)
).strides
: A tuple specifying the step size for the window's movement in height and width. If None
(the default), it will default to pool_size
, which is common practice for non-overlapping pooling.padding
: Similar to Conv2D
, typically 'valid'
(no padding, dimensions might shrink if window/stride don't perfectly fit) or 'same'
(pads with zeros so output has dimensions determined mainly by stride, often floor(input_dim / stride)). 'valid'
is the default.Let's examine the shape transformation caused by MaxPooling2D
with default strides (strides=pool_size
) and padding='valid'
:
# Example demonstrating shape change
input_shape = (1, 28, 28, 16) # Batch=1, Height=28, Width=28, Channels=16
input_tensor = keras.random.uniform(input_shape)
# Apply MaxPooling2D with a 2x2 pool
pooling_layer = layers.MaxPooling2D(pool_size=(2, 2))
output_tensor = pooling_layer(input_tensor)
print(f"Input shape: {input_tensor.shape}")
print(f"Output shape after MaxPooling2D(2,2): {output_tensor.shape}")
# Apply MaxPooling2D with a 3x3 pool
pooling_layer_3x3 = layers.MaxPooling2D(pool_size=(3, 3)) # Strides default to (3, 3)
output_tensor_3x3 = pooling_layer_3x3(input_tensor)
print(f"Output shape after MaxPooling2D(3,3): {output_tensor_3x3.shape}")
# --- Expected Output ---
# Input shape: (1, 28, 28, 16)
# Output shape after MaxPooling2D(2,2): (1, 14, 14, 16)
# Output shape after MaxPooling2D(3,3): (1, 9, 9, 16) # 28/3 = 9.33 -> floor is 9
Notice how the spatial dimensions (height and width) are reduced according to the pool_size
and strides
(implicitly pool_size
here), while the number of channels (feature maps) remains unchanged.
While MaxPooling is very common, alternatives exist:
AveragePooling2D
): Calculates the average value within the pooling window instead of the maximum. It provides a smoother downsampling but might dilute very strong, localized features.GlobalMaxPooling2D
, GlobalAveragePooling2D
): These layers perform pooling across the entire spatial dimensions of a feature map, reducing each feature map to a single value (either the max or the average). They are often used as an alternative to Flatten
just before the final Dense
classification layers, significantly reducing the number of parameters.In practice, MaxPooling is frequently the default choice in CNNs for image classification tasks due to its effectiveness in summarizing the most active features and providing robustness. You will typically see Conv2D
layers followed by MaxPooling2D
repeated several times to progressively reduce spatial resolution while increasing the number of feature channels.
© 2025 ApX Machine Learning