The convolutional and pooling layers you've learned about act as powerful feature extractors. They process input images (or other grid-like data) and produce feature maps, multi-dimensional tensors representing detected patterns like edges, textures, or more complex shapes. For instance, after several Conv2D
and MaxPooling2D
layers, you might have a tensor with a shape like (height,width,channels), say (7,7,64). This tensor retains spatial information; the values in the 64 channels correspond to specific features detected at different locations within the downsampled 7×7 grid.
However, for tasks like classification, we typically need to make a final prediction based on the entire set of extracted features. Standard fully connected (Dense
) layers expect their input as a one-dimensional vector, where each element represents a single feature value, without any inherent 2D or 3D spatial structure. The output of our convolutional base, being a multi-dimensional tensor like (7,7,64), isn't directly compatible with Dense
layers.
This is where the Flatten
layer comes in. Its job is simple but essential: it takes the multi-dimensional output from the convolutional base and reshapes, or "flattens," it into a single, long one-dimensional vector. It does this by essentially unstacking the elements row by row, channel by channel.
For example, if the input tensor has shape (height,width,channels), the Flatten
layer will produce a vector of length height×width×channels. Using our previous example of a (7,7,64) tensor, the Flatten
layer would transform it into a vector with 7×7×64=3136 elements.
Flow showing the Flatten layer reshaping the multi-dimensional output of the convolutional base into a 1D vector.
The Flatten
layer itself doesn't learn anything; it contains no trainable weights. It's purely a structural transformation required to connect the feature extraction part of the CNN (the convolutional base) to the classification or regression part (the head).
Once the feature maps are flattened into a vector, we can feed this vector into one or more standard Dense
(fully connected) layers. These layers work just like the ones you encountered in basic neural networks. Each neuron in a Dense
layer receives input from all neurons in the previous layer (in this case, all elements of the flattened vector).
The purpose of these Dense
layers in a CNN architecture is to learn combinations of the features extracted by the convolutional base. While the convolutional layers learned local patterns (edges, textures in small patches), the Dense
layers learn global patterns across the entire input image, based on which features were activated where. They combine these high-level features to make the final prediction.
Typically, a CNN includes one or more Dense
layers after the Flatten
layer:
relu
) to introduce non-linearity and learn complex feature combinations. The number of units in these layers is a hyperparameter to be tuned (e.g., 128, 256, 512).sigmoid
activation.N
units (where N
is the number of classes) with a softmax
activation.Adding Flatten
and Dense
layers to a Keras model is straightforward. You simply add them after the last convolutional or pooling layer. Here's how it looks in a Sequential model context:
import keras
from keras import layers
# Assume 'model' is a Sequential model already containing Conv2D/MaxPooling2D layers
# Example input shape to the model: (28, 28, 1) for MNIST
model = keras.Sequential(
[
keras.Input(shape=(28, 28, 1)),
# --- Convolutional Base ---
layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
# --- Classifier Head ---
layers.Flatten(), # Flatten the 3D feature map to 1D
layers.Dropout(0.5), # Optional: Dropout for regularization
layers.Dense(10, activation="softmax"), # Output layer for 10 classes (e.g., MNIST digits)
]
)
model.summary()
In this example:
layers.Flatten()
takes the output of the last MaxPooling2D
layer and converts it into a 1D vector.layers.Dropout(0.5)
layer is optionally added for regularization (we'll cover this in Chapter 6).layers.Dense(10, activation="softmax")
layer performs the classification, outputting probabilities for each of the 10 classes.The combination of the convolutional base for feature extraction and the Flatten
plus Dense
layers for classification forms the standard architecture for many successful CNNs used in image recognition and other domains.
A Note on Alternatives: While Flatten
followed by Dense
is common, other techniques like GlobalAveragePooling2D
or GlobalMaxPooling2D
exist. These layers also bridge the gap between convolutional maps and the final output, often reducing the number of parameters and potentially improving generalization. They work by taking the average or maximum value across the spatial dimensions (height, width) of each feature map, resulting in a single value per channel, thus creating a 1D vector directly. We focus on Flatten
here as it's a fundamental concept, but be aware of these alternatives for more advanced architectures.
© 2025 ApX Machine Learning