Convolutional and pooling layers act as powerful feature extractors. These layers process input images (or other grid-like data) and produce feature maps, multi-dimensional tensors representing detected patterns like edges, textures, or more complex shapes. For instance, after several Conv2D and MaxPooling2D layers, one might have a tensor with a shape like (height,width,channels), say (7,7,64). This tensor retains spatial information; the values in the 64 channels correspond to specific features detected at different locations within the downsampled 7×7 grid.
However, for tasks like classification, we typically need to make a final prediction based on the entire set of extracted features. Standard fully connected (Dense) layers expect their input as a one-dimensional vector, where each element represents a single feature value, without any inherent 2D or 3D spatial structure. The output of our convolutional base, being a multi-dimensional tensor like (7,7,64), isn't directly compatible with Dense layers.
This is where the Flatten layer comes in. Its job is simple but essential: it takes the multi-dimensional output from the convolutional base and reshapes, or "flattens," it into a single, long one-dimensional vector. It does this by essentially unstacking the elements row by row, channel by channel.
For example, if the input tensor has shape (height,width,channels), the Flatten layer will produce a vector of length height×width×channels. Using our previous example of a (7,7,64) tensor, the Flatten layer would transform it into a vector with 7×7×64=3136 elements.
Flow showing the Flatten layer reshaping the multi-dimensional output of the convolutional base into a 1D vector.
The Flatten layer itself doesn't learn anything; it contains no trainable weights. It's purely a structural transformation required to connect the feature extraction part of the CNN (the convolutional base) to the classification or regression part (the head).
Once the feature maps are flattened into a vector, we can feed this vector into one or more standard Dense (fully connected) layers. These layers work just like the ones you encountered in basic neural networks. Each neuron in a Dense layer receives input from all neurons in the previous layer (in this case, all elements of the flattened vector).
The purpose of these Dense layers in a CNN architecture is to learn combinations of the features extracted by the convolutional base. While the convolutional layers learned local patterns (edges, textures in small patches), the Dense layers learn global patterns across the entire input image, based on which features were activated where. They combine these high-level features to make the final prediction.
Typically, a CNN includes one or more Dense layers after the Flatten layer:
relu) to introduce non-linearity and learn complex feature combinations. The number of units in these layers is a hyperparameter to be tuned (e.g., 128, 256, 512).sigmoid activation.N units (where N is the number of classes) with a softmax activation.Adding Flatten and Dense layers to a Keras model is straightforward. You simply add them after the last convolutional or pooling layer. Here's how it looks in a Sequential model context:
import keras
from keras import layers
# Assume 'model' is a Sequential model already containing Conv2D/MaxPooling2D layers
# Example input shape to the model: (28, 28, 1) for MNIST
model = keras.Sequential(
[
keras.Input(shape=(28, 28, 1)),
# --- Convolutional Base ---
layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
# --- Classifier Head ---
layers.Flatten(), # Flatten the 3D feature map to 1D
layers.Dropout(0.5), # Optional: Dropout for regularization
layers.Dense(10, activation="softmax"), # Output layer for 10 classes (e.g., MNIST digits)
]
)
model.summary()
In this example:
layers.Flatten() takes the output of the last MaxPooling2D layer and converts it into a 1D vector.layers.Dropout(0.5) layer is optionally added for regularization (we'll cover this in Chapter 6).layers.Dense(10, activation="softmax") layer performs the classification, outputting probabilities for each of the 10 classes.The combination of the convolutional base for feature extraction and the Flatten plus Dense layers for classification forms the standard architecture for many successful CNNs used in image recognition and other domains.
A Note on Alternatives: While Flatten followed by Dense is common, other techniques like GlobalAveragePooling2D or GlobalMaxPooling2D exist. These layers also bridge the gap between convolutional maps and the final output, often reducing the number of parameters and potentially improving generalization. They work by taking the average or maximum value across the spatial dimensions (height, width) of each feature map, resulting in a single value per channel, thus creating a 1D vector directly. We focus on Flatten here as it's a fundamental concept, but be aware of these alternatives for more advanced architectures.
Was this section helpful?
Flatten layer, detailing its purpose, usage, and parameters for transforming multi-dimensional outputs into 1D vectors.Dense layer, explaining its functionality, parameters like units and activation functions, and its role in building neural network heads.© 2026 ApX Machine LearningEngineered with