Let's put the techniques discussed in this chapter into practice. We'll revisit a model built earlier in the course, specifically the Convolutional Neural Network (CNN) for image classification from Chapter 4 (or a similar model you've worked with), and apply several methods to potentially improve its performance and make the training process more robust. The goal is not just to get a higher accuracy number, but to build a model that generalizes better to unseen data and to streamline our development workflow.
We'll assume we have a baseline CNN trained on a dataset like CIFAR-10. Often, a common issue with such models, especially if trained for many epochs, is overfitting: the model performs well on the training data but poorly on the validation or test data. Our baseline model might exhibit this behavior.
First, let's establish our starting point. Imagine our baseline CNN (from Chapter 4) achieved approximately 70% accuracy on the CIFAR-10 validation set after 20 epochs, but the validation accuracy plateaued or even started decreasing while the training accuracy continued to climb. This divergence is a classic sign of overfitting.
Our task is to rebuild and retrain this model using Dropout, Data Augmentation, and Callbacks to address overfitting and manage the training process. We'll use Keras 3, keeping in mind its backend flexibility (we'll use syntax compatible with PyTorch or TensorFlow).
# Assume necessary imports: keras, layers, torch (or tensorflow)
# Assume cifar10 dataset is loaded and preprocessed into:
# x_train, y_train, x_val, y_val, x_test, y_test
# (with pixel values scaled, e.g., to [0, 1], and labels one-hot encoded)
import keras
from keras import layers
# import torch # If using PyTorch backend
# Example: Define input shape and number of classes for CIFAR-10
input_shape = (32, 32, 3)
num_classes = 10
# Baseline model structure (Simplified from Chapter 4 for illustration)
# baseline_model = keras.Sequential(
# [
# keras.Input(shape=input_shape),
# layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
# layers.MaxPooling2D(pool_size=(2, 2)),
# layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
# layers.MaxPooling2D(pool_size=(2, 2)),
# layers.Flatten(),
# layers.Dense(128, activation="relu"),
# layers.Dense(num_classes, activation="softmax"),
# ]
# )
# baseline_model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
# History baseline = baseline_model.fit(x_train, y_train, batch_size=128, epochs=20, validation_data=(x_val, y_val))
# Baseline evaluation: baseline_loss, baseline_acc = baseline_model.evaluate(x_test, y_test)
# print(f"Baseline Test Accuracy: {baseline_acc:.4f}") # Example result: ~0.68
To combat overfitting, we'll introduce Dropout
layers. Dropout randomly sets a fraction of input units to 0 at each update during training, which helps prevent complex co-adaptations on training data. We'll add Dropout after pooling layers and before the final dense layer.
# Improved model with Dropout
improved_model = keras.Sequential(
[
keras.Input(shape=input_shape),
layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
# Dropout after pooling
layers.Dropout(0.25), # Dropout rate of 25%
layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
# Dropout after pooling
layers.Dropout(0.25), # Dropout rate of 25%
layers.Flatten(),
layers.Dense(128, activation="relu"),
# Dropout before final dense layer
layers.Dropout(0.5), # Higher dropout rate for dense layers is common
layers.Dense(num_classes, activation="softmax"),
]
)
improved_model.summary()
The dropout rates (0.25, 0.5) are common starting points; they are hyperparameters that could be tuned later.
Data augmentation artificially expands the training dataset by creating modified versions of existing images. This makes the model more invariant to transformations like rotations, shifts, or flips, improving generalization. Keras offers preprocessing layers that can be included directly in the model definition or applied to the dataset pipeline. Adding them to the model is convenient.
# Model definition incorporating Data Augmentation layers
data_augmentation = keras.Sequential(
[
layers.RandomFlip("horizontal", input_shape=input_shape),
layers.RandomRotation(0.1),
layers.RandomZoom(0.1),
# Add other augmentations as needed (e.g., RandomContrast)
]
)
improved_model_with_aug = keras.Sequential(
[
keras.Input(shape=input_shape),
# Apply augmentation layers first
data_augmentation,
# Rest of the model structure (with Dropout)
layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Dropout(0.25),
layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Dropout(0.25),
layers.Flatten(),
layers.Dense(128, activation="relu"),
layers.Dropout(0.5),
layers.Dense(num_classes, activation="softmax"),
]
)
improved_model_with_aug.summary()
Now, the model will see slightly different versions of the training images in each epoch.
Callbacks allow us to automate actions during training. We'll use ModelCheckpoint
to save the model weights only when validation performance improves and EarlyStopping
to halt training if validation performance stops improving for a defined number of epochs.
# Define callbacks
callbacks = [
keras.callbacks.ModelCheckpoint(
filepath="best_model.keras", # Path to save the model file
save_best_only=True, # Only save when val_loss improves
monitor="val_loss", # Metric to monitor
verbose=1 # Log when saving
),
keras.callbacks.EarlyStopping(
monitor="val_loss", # Metric to monitor
patience=10, # Number of epochs with no improvement to wait
verbose=1, # Log when stopping
restore_best_weights=True # Restore weights from the epoch with the best val_loss
)
# Optional: Add TensorBoard callback
# keras.callbacks.TensorBoard(log_dir="./logs")
]
# Compile the improved model (with augmentation and dropout)
improved_model_with_aug.compile(loss="categorical_crossentropy",
optimizer="adam",
metrics=["accuracy"])
Now we train the enhanced model using the fit
method, passing our callbacks list. We might need to train for more epochs initially, as regularization and augmentation can sometimes slow down convergence, but EarlyStopping
will prevent excessive training.
# Train the model
print("Training the improved model...")
history_improved = improved_model_with_aug.fit(
x_train, y_train,
batch_size=128,
epochs=50, # Allow more epochs, EarlyStopping will manage it
validation_data=(x_val, y_val),
callbacks=callbacks # Include our defined callbacks
)
# Note: EarlyStopping might stop training before 50 epochs.
# The 'best_model.keras' file now contains the weights from the epoch with the lowest val_loss.
# Option 1: EarlyStopping restored best weights automatically if restore_best_weights=True
print("\nEvaluating the model with restored best weights (due to EarlyStopping)...")
improved_loss, improved_acc = improved_model_with_aug.evaluate(x_test, y_test, verbose=0)
print(f"Improved Test Accuracy (from EarlyStopping): {improved_acc:.4f}")
# Option 2: Explicitly load the best model saved by ModelCheckpoint
# print("\nLoading the best model saved by ModelCheckpoint and evaluating...")
# best_model = keras.models.load_model("best_model.keras")
# improved_loss, improved_acc = best_model.evaluate(x_test, y_test, verbose=0)
# print(f"Improved Test Accuracy (from best_model.keras): {improved_acc:.4f}")
# Compare with baseline (example hypothetical results)
# Baseline Test Accuracy: ~0.68
# Improved Test Accuracy: ~0.75 (Expected improvement)
You should observe that the gap between training and validation accuracy is smaller than in the baseline model, indicating reduced overfitting. The final test accuracy should hopefully be higher than the baseline.
Let's visualize the training history to see the impact. We expect the validation curve of the improved model to track the training curve more closely and potentially reach a higher peak.
Hypothetical comparison of training and validation accuracy curves for the baseline model (blue) and the improved model (green) with regularization and augmentation. Note how the improved model shows less overfitting (smaller gap between train/val curves) and achieves better validation accuracy. Early stopping might terminate the improved model's training around epoch 25-30.
After evaluation, if satisfied, you can save the final trained model (which might already be saved as best_model.keras
thanks to ModelCheckpoint
).
# If you didn't use ModelCheckpoint or want to save the current state
# improved_model_with_aug.save("final_improved_cnn_cifar10.keras")
# loaded_model = keras.models.load_model("final_improved_cnn_cifar10.keras")
# print("Model saved and re-loaded successfully.")
While we've applied several effective techniques, achieving optimal performance often requires tuning hyperparameters. This could involve experimenting with:
Dropout
rates.Data Augmentation
.Adam
's default learning rate might not be optimal).RMSprop
, SGD
with momentum).Tools and techniques like KerasTuner or Optuna can help automate this search process, but even manual experimentation based on the validation results observed during training (perhaps using TensorBoard) can yield significant gains.
This practical exercise demonstrated how to apply dropout, data augmentation, and callbacks (ModelCheckpoint
, EarlyStopping
) to improve a CNN's generalization and manage the training process more effectively. These techniques are fundamental tools in the deep learning practitioner's toolkit for building more reliable and performant models.
© 2025 ApX Machine Learning