All Courses

Practice: Implementing a CNN for Image Classification

Now that you understand the core components of Convolutional Neural Networks (CNNs) like Conv2D and MaxPooling2D layers, let's put them into practice. This section guides you through building, training, and evaluating a CNN for an image classification task using Keras. We'll use the CIFAR-10 dataset, a common benchmark for image classification models.

The CIFAR-10 Dataset

CIFAR-10 consists of 60,000 32x32 pixel color images distributed across 10 distinct classes (e.g., airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck). There are 6,000 images per class, split into 50,000 training images and 10,000 testing images. Our goal is to train a CNN that can correctly classify these small images.

1. Setup and Data Loading

First, we need to import the necessary libraries from Keras and load the dataset. Keras conveniently provides access to several standard datasets, including CIFAR-10.

import numpy as np
import keras
from keras import layers
from keras.datasets import cifar10
from keras.utils import to_categorical

# Load the CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

print(f"Training data shape: {x_train.shape}")
print(f"Training labels shape: {y_train.shape}")
print(f"Test data shape: {x_test.shape}")
print(f"Test labels shape: {y_test.shape}")
print(f"Number of training samples: {x_train.shape[0]}")
print(f"Number of test samples: {x_test.shape[0]}")

You should see output indicating the shape of the data: 50,000 training images and 10,000 test images, each being 32x32 pixels with 3 color channels (RGB). The labels are initially integers from 0 to 9.

2. Data Preprocessing

Neural networks generally perform better when input data is scaled appropriately. We'll normalize the pixel values from the range [0, 255] to [0, 1]. Additionally, since we have 10 distinct classes and will use categorical crossentropy as our loss function, we need to convert the integer labels into a one-hot encoded format.

# Normalize pixel values to be between 0 and 1
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Convert class vectors to binary class matrices (one-hot encoding)
num_classes = 10
y_train = to_categorical(y_train, num_classes)
y_test = to_categorical(y_test, num_classes)

print(f"Sample training label (original): {y_train[0].argmax()}") # Example original label index
print(f"Sample training label (one-hot): {y_train[0]}") # Example one-hot encoded label

Normalization helps stabilize the training process, and one-hot encoding represents each label as a vector where only the element corresponding to the class index is 1, and all others are 0. This format is required by the categorical_crossentropy loss function.

3. Building the CNN Model

Now, let's define the architecture of our CNN using the Keras Sequential API. We'll stack Conv2D and MaxPooling2D layers, followed by Flatten and Dense layers for classification.

input_shape = x_train.shape[1:] # Should be (32, 32, 3)

model = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5), # Add dropout for regularization
        layers.Dense(num_classes, activation="softmax"),
    ]
)

model.summary()

Let's break down this architecture:

keras.Input(shape=input_shape): Defines the expected input shape for the model.
Conv2D(32, kernel_size=(3, 3), activation="relu"): The first convolutional layer uses 32 filters of size 3x3. It applies the ReLU activation function. The default padding is 'valid', meaning the output spatial dimensions might shrink slightly.
MaxPooling2D(pool_size=(2, 2)): Reduces the spatial dimensions (height and width) of the feature maps by half.
Conv2D(64, kernel_size=(3, 3), activation="relu"): The second convolutional layer uses 64 filters. It's common practice to increase the number of filters in deeper layers as the spatial dimensions decrease.
MaxPooling2D(pool_size=(2, 2)): Another pooling layer for further downsampling.
Flatten(): Converts the 2D feature maps from the pooling layer into a 1D vector, preparing it for the fully connected layers.
Dropout(0.5): A regularization technique where randomly selected neurons are ignored during training (set to 0) for each update cycle. This helps prevent overfitting by reducing the codependency between neurons. A rate of 0.5 means half of the input units are dropped.
Dense(num_classes, activation="softmax"): The final output layer. It has num_classes (10) units, one for each class. The softmax activation function outputs a probability distribution over the classes, indicating the model's confidence for each class.

The model.summary() output provides a useful overview of the layers, their output shapes, and the number of trainable parameters.

4. Compiling the Model

Before training, we need to configure the learning process using the compile() method. This involves specifying the optimizer, the loss function, and any metrics to monitor during training.

# Compile the model
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

loss="categorical_crossentropy": This loss function is suitable for multi-class classification problems where labels are one-hot encoded. It measures the difference between the predicted probability distribution (from softmax) and the true distribution.
optimizer="adam": Adam is a popular and generally effective optimization algorithm that adapts the learning rate during training.
metrics=["accuracy"]: We ask the model to report the classification accuracy during training and evaluation.

5. Training the Model

Now we train the model using the fit() method, providing the training data, batch size, number of epochs, and validation data.

batch_size = 128
epochs = 15

print("Starting training...")
history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)
print("Training finished.")

x_train, y_train: The training images and their corresponding one-hot encoded labels.
batch_size=128: The number of samples processed in each iteration (gradient update). Larger batch sizes can speed up training but may require more memory.
epochs=15: The number of times the model will iterate over the entire training dataset.
validation_split=0.1: Instead of using the separate test set for validation during training (which risks indirectly fitting to the test set), we reserve 10% of the training data to use as a validation set. This data is not used for training the model weights but is evaluated at the end of each epoch to monitor performance on unseen data and detect potential overfitting.

The fit() method returns a History object, which contains records of the training and validation loss and metrics at each epoch.

6. Evaluating the Model

After training, we evaluate the model's performance on the held-out test set (x_test, y_test), which the model has never seen before.

# Evaluate the model on the test data
score = model.evaluate(x_test, y_test, verbose=0)
print(f"Test loss: {score[0]:.4f}")
print(f"Test accuracy: {score[1]:.4f}")

This provides the final loss and accuracy on the test set, giving an unbiased estimate of the model's generalization ability. You should expect the test accuracy to be lower than the training accuracy, especially if overfitting occurred. With this simple CNN and limited epochs, accuracy might be around 65-75%. More complex architectures and techniques discussed later (like data augmentation and more sophisticated regularization) can significantly improve this.

7. Visualizing Training History

Plotting the training and validation accuracy and loss over epochs helps understand the training dynamics and identify overfitting.

Accuracy curves for training and validation sets over 15 epochs.

Loss curves for training and validation sets over 15 epochs.

Ideally, both training and validation accuracy should increase while loss decreases. If the validation accuracy plateaus or starts decreasing while the training accuracy continues to rise (and validation loss increases), it's a sign of overfitting. Our example plots show typical behavior: initial rapid improvement followed by slower gains. The gap between training and validation metrics indicates some overfitting, which the Dropout layer helps mitigate. Techniques in Chapter 6 will explore how to further improve generalization.

This practical exercise demonstrates the end-to-end process of using a CNN for image classification with Keras. You've loaded data, preprocessed it, built a standard CNN architecture, compiled it, trained it, and evaluated its performance. This forms a solid foundation for tackling more complex image-related problems.

Was this section helpful?