Model improvement techniques will be applied to a Convolutional Neural Network (CNN) designed for image classification. Methods will be used to enhance the model's performance and make the training process more stable. The objective is not only to achieve higher accuracy but also to build a model that generalizes better to unseen data and to enhance development workflow.We'll assume we have a baseline CNN trained on a dataset like CIFAR-10. Often, a common issue with such models, especially if trained for many epochs, is overfitting: the model performs well on the training data but poorly on the validation or test data. Our baseline model might exhibit this behavior.Setting Up the ScenarioFirst, let's establish our starting point. Imagine our baseline CNN (from Chapter 4) achieved approximately 70% accuracy on the CIFAR-10 validation set after 20 epochs, but the validation accuracy plateaued or even started decreasing while the training accuracy continued to climb. This divergence is a classic sign of overfitting.Our task is to rebuild and retrain this model using Dropout, Data Augmentation, and Callbacks to address overfitting and manage the training process. We'll use Keras 3, keeping in mind its backend flexibility (we'll use syntax compatible with PyTorch or TensorFlow).# Assume necessary imports: keras, layers, torch (or tensorflow) # Assume cifar10 dataset is loaded and preprocessed into: # x_train, y_train, x_val, y_val, x_test, y_test # (with pixel values scaled, e.g., to [0, 1], and labels one-hot encoded) import keras from keras import layers # import torch # If using PyTorch backend # Example: Define input shape and number of classes for CIFAR-10 input_shape = (32, 32, 3) num_classes = 10 # Baseline model structure (Simplified from Chapter 4 for illustration) # baseline_model = keras.Sequential( # [ # keras.Input(shape=input_shape), # layers.Conv2D(32, kernel_size=(3, 3), activation="relu"), # layers.MaxPooling2D(pool_size=(2, 2)), # layers.Conv2D(64, kernel_size=(3, 3), activation="relu"), # layers.MaxPooling2D(pool_size=(2, 2)), # layers.Flatten(), # layers.Dense(128, activation="relu"), # layers.Dense(num_classes, activation="softmax"), # ] # ) # baseline_model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"]) # History baseline = baseline_model.fit(x_train, y_train, batch_size=128, epochs=20, validation_data=(x_val, y_val)) # Baseline evaluation: baseline_loss, baseline_acc = baseline_model.evaluate(x_test, y_test) # print(f"Baseline Test Accuracy: {baseline_acc:.4f}") # Example result: ~0.68Applying Regularization: DropoutTo combat overfitting, we'll introduce Dropout layers. Dropout randomly sets a fraction of input units to 0 at each update during training, which helps prevent complex co-adaptations on training data. We'll add Dropout after pooling layers and before the final dense layer.# Improved model with Dropout improved_model = keras.Sequential( [ keras.Input(shape=input_shape), layers.Conv2D(32, kernel_size=(3, 3), activation="relu"), layers.MaxPooling2D(pool_size=(2, 2)), # Dropout after pooling layers.Dropout(0.25), # Dropout rate of 25% layers.Conv2D(64, kernel_size=(3, 3), activation="relu"), layers.MaxPooling2D(pool_size=(2, 2)), # Dropout after pooling layers.Dropout(0.25), # Dropout rate of 25% layers.Flatten(), layers.Dense(128, activation="relu"), # Dropout before final dense layer layers.Dropout(0.5), # Higher dropout rate for dense layers is common layers.Dense(num_classes, activation="softmax"), ] ) improved_model.summary()The dropout rates (0.25, 0.5) are common starting points; they are hyperparameters that could be tuned later.Implementing Data AugmentationData augmentation artificially expands the training dataset by creating modified versions of existing images. This makes the model more invariant to transformations like rotations, shifts, or flips, improving generalization. Keras offers preprocessing layers that can be included directly in the model definition or applied to the dataset pipeline. Adding them to the model is convenient.# Model definition incorporating Data Augmentation layers data_augmentation = keras.Sequential( [ layers.RandomFlip("horizontal", input_shape=input_shape), layers.RandomRotation(0.1), layers.RandomZoom(0.1), # Add other augmentations as needed (e.g., RandomContrast) ] ) improved_model_with_aug = keras.Sequential( [ keras.Input(shape=input_shape), # Apply augmentation layers first data_augmentation, # Rest of the model structure (with Dropout) layers.Conv2D(32, kernel_size=(3, 3), activation="relu"), layers.MaxPooling2D(pool_size=(2, 2)), layers.Dropout(0.25), layers.Conv2D(64, kernel_size=(3, 3), activation="relu"), layers.MaxPooling2D(pool_size=(2, 2)), layers.Dropout(0.25), layers.Flatten(), layers.Dense(128, activation="relu"), layers.Dropout(0.5), layers.Dense(num_classes, activation="softmax"), ] ) improved_model_with_aug.summary()Now, the model will see slightly different versions of the training images in each epoch.Using Callbacks for Efficient TrainingCallbacks allow us to automate actions during training. We'll use ModelCheckpoint to save the model weights only when validation performance improves and EarlyStopping to halt training if validation performance stops improving for a defined number of epochs.# Define callbacks callbacks = [ keras.callbacks.ModelCheckpoint( filepath="best_model.keras", # Path to save the model file save_best_only=True, # Only save when val_loss improves monitor="val_loss", # Metric to monitor verbose=1 # Log when saving ), keras.callbacks.EarlyStopping( monitor="val_loss", # Metric to monitor patience=10, # Number of epochs with no improvement to wait verbose=1, # Log when stopping restore_best_weights=True # Restore weights from the epoch with the best val_loss ) # Optional: Add TensorBoard callback # keras.callbacks.TensorBoard(log_dir="./logs") ] # Compile the improved model (with augmentation and dropout) improved_model_with_aug.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"]) Training and Evaluating the Improved ModelNow we train the enhanced model using the fit method, passing our callbacks list. We might need to train for more epochs initially, as regularization and augmentation can sometimes slow down convergence, but EarlyStopping will prevent excessive training.# Train the model print("Training the improved model...") history_improved = improved_model_with_aug.fit( x_train, y_train, batch_size=128, epochs=50, # Allow more epochs, EarlyStopping will manage it validation_data=(x_val, y_val), callbacks=callbacks # Include our defined callbacks ) # Note: EarlyStopping might stop training before 50 epochs. # The 'best_model.keras' file now contains the weights from the epoch with the lowest val_loss. # Option 1: EarlyStopping restored best weights automatically if restore_best_weights=True print("\nEvaluating the model with restored best weights (due to EarlyStopping)...") improved_loss, improved_acc = improved_model_with_aug.evaluate(x_test, y_test, verbose=0) print(f"Improved Test Accuracy (from EarlyStopping): {improved_acc:.4f}") # Option 2: Explicitly load the best model saved by ModelCheckpoint # print("\nLoading the best model saved by ModelCheckpoint and evaluating...") # best_model = keras.models.load_model("best_model.keras") # improved_loss, improved_acc = best_model.evaluate(x_test, y_test, verbose=0) # print(f"Improved Test Accuracy (from best_model.keras): {improved_acc:.4f}") # Compare with baseline (example results) # Baseline Test Accuracy: ~0.68 # Improved Test Accuracy: ~0.75 (Expected improvement)You should observe that the gap between training and validation accuracy is smaller than in the baseline model, indicating reduced overfitting. The final test accuracy should hopefully be higher than the baseline.Visualizing ImprovementLet's visualize the training history to see the impact. We expect the validation curve of the improved model to track the training curve more closely and potentially reach a higher peak.{"layout": {"title": "Model Training History Comparison", "xaxis": {"title": "Epoch"}, "yaxis": {"title": "Accuracy"}, "legend": {"title": "Legend"}}, "data": [{"x": [1, 5, 10, 15, 20], "y": [0.40, 0.65, 0.75, 0.82, 0.88], "mode": "lines", "name": "Baseline Train Acc", "line": {"color": "#74c0fc", "dash": "dash"}}, {"x": [1, 5, 10, 15, 20], "y": [0.35, 0.58, 0.68, 0.70, 0.69], "mode": "lines", "name": "Baseline Val Acc", "line": {"color": "#74c0fc"}}, {"x": [1, 5, 10, 15, 20, 25, 30], "y": [0.30, 0.55, 0.65, 0.72, 0.76, 0.79, 0.81], "mode": "lines", "name": "Improved Train Acc", "line": {"color": "#69db7c", "dash": "dash"}}, {"x": [1, 5, 10, 15, 20, 25, 30], "y": [0.28, 0.52, 0.63, 0.70, 0.74, 0.75, 0.75], "mode": "lines", "name": "Improved Val Acc", "line": {"color": "#69db7c"}}]}Comparison of training and validation accuracy curves for the baseline model (blue) and the improved model (green) with regularization and augmentation. Note how the improved model shows less overfitting (smaller gap between train/val curves) and achieves better validation accuracy. Early stopping might terminate the improved model's training around epoch 25-30.Saving the Final ModelAfter evaluation, if satisfied, you can save the final trained model (which might already be saved as best_model.keras thanks to ModelCheckpoint).# If you didn't use ModelCheckpoint or want to save the current state # improved_model_with_aug.save("final_improved_cnn_cifar10.keras") # loaded_model = keras.models.load_model("final_improved_cnn_cifar10.keras") # print("Model saved and re-loaded successfully.")Further Steps: Hyperparameter TuningWhile we've applied several effective techniques, achieving optimal performance often requires tuning hyperparameters. This could involve experimenting with:Different Dropout rates.The strength and types of Data Augmentation.Learning rate for the optimizer (Adam's default learning rate might not be optimal).Network architecture variations (number of filters, layers).Different optimization algorithms (RMSprop, SGD with momentum).Tools and techniques like KerasTuner or Optuna can help automate this search process, but even manual experimentation based on the validation results observed during training (perhaps using TensorBoard) can yield significant gains.This practical exercise demonstrated how to apply dropout, data augmentation, and callbacks (ModelCheckpoint, EarlyStopping) to improve a CNN's generalization and manage the training process more effectively. These techniques are fundamental tools in the deep learning practitioner's toolkit for building more reliable and performant models.