Okay, let's put the concepts from this chapter into practice. We've discussed the importance of weight initialization, how learning rate schedules can help convergence, and methods like grid search and random search for finding good hyperparameter values. Now, you'll get hands-on experience applying these ideas to tune a deep learning model.
Tuning hyperparameters is often more art than science, requiring experimentation. However, a systematic approach significantly increases your chances of finding a configuration that leads to better model performance and generalization.
For this exercise, we'll use a common scenario: image classification using a simple Convolutional Neural Network (CNN) on the CIFAR-10 dataset. CIFAR-10 consists of 60,000 32x32 color images in 10 classes. We'll assume you have a basic PyTorch environment set up and are familiar with defining models, loading data, and writing training loops.
Our goal isn't to build the absolute best CIFAR-10 classifier, but rather to demonstrate the process of hyperparameter tuning.
First, let's define a simple CNN architecture using PyTorch. This will be our base model:
import torch
import torch.nn as nn
import torch.nn.functional as F
class SimpleCNN(nn.Module):
def __init__(self, dropout_rate=0.5):
super().__init__()
# Convolutional layers
self.conv1 = nn.Conv2d(3, 16, kernel_size=3, padding=1) # Input: 3x32x32 -> Output: 16x32x32
self.pool = nn.MaxPool2d(2, 2) # Output: 16x16x16
self.conv2 = nn.Conv2d(16, 32, kernel_size=3, padding=1) # Output: 32x16x16
# After pooling: 32x8x8
# Fully connected layers
self.fc1 = nn.Linear(32 * 8 * 8, 128) # Flattened size: 32*8*8 = 2048
self.dropout = nn.Dropout(dropout_rate) # Apply dropout
self.fc2 = nn.Linear(128, 10) # 10 output classes
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = torch.flatten(x, 1) # Flatten all dimensions except batch
x = F.relu(self.fc1(x))
x = self.dropout(x)
x = self.fc2(x)
return x
# Note: Weight initialization (like He init) is often handled by default
# in PyTorch layers, but could be explicitly set here if needed.
We also need standard data loading and transformation pipelines for CIFAR-10. We'll assume you have functions load_cifar10_data(batch_size)
that return PyTorch DataLoader
instances for training and validation sets. Remember to include normalization.
Based on the chapter content, several hyperparameters are candidates for tuning:
Adam
vs SGD
with Momentum. Adam is often a good default, but SGD+Momentum can sometimes achieve slightly better generalization with careful tuning. For simplicity here, let's stick with Adam
but tune its learning rate.SimpleCNN
. Values typically range from 0.1 to 0.5.We'll use Random Search for efficiency. Let's define the search space:
The core idea is to run multiple training experiments (trials), each with a randomly sampled set of hyperparameters from our defined space. We train for a fixed, relatively small number of epochs (e.g., 10-15) to get a quick signal, record the validation performance, and then compare the results across trials.
Here's a conceptual outline of the tuning loop:
import random
import numpy as np
import torch.optim as optim
# Assume SimpleCNN, load_cifar10_data are defined
# Assume train_one_epoch() and evaluate() functions exist
num_trials = 20 # Number of random configurations to try
num_epochs_per_trial = 10 # Train for a short duration
results = []
for trial in range(num_trials):
print(f"--- Trial {trial+1}/{num_trials} ---")
# 1. Sample Hyperparameters
lr = 10**np.random.uniform(-4, -2) # Log-uniform sampling for LR
weight_decay = 10**np.random.uniform(-5, -3) # Log-uniform for weight decay
dropout_rate = random.uniform(0.1, 0.5)
batch_size = random.choice([64, 128, 256])
print(f"Sampled: lr={lr:.6f}, wd={weight_decay:.6f}, dropout={dropout_rate:.4f}, batch_size={batch_size}")
# 2. Setup Dataloaders, Model, Optimizer
train_loader, val_loader = load_cifar10_data(batch_size=batch_size)
model = SimpleCNN(dropout_rate=dropout_rate)
# Consider using CUDA if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
optimizer = optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)
criterion = nn.CrossEntropyLoss()
best_val_accuracy = 0.0
# 3. Train for fixed epochs
for epoch in range(num_epochs_per_trial):
# train_one_epoch(model, train_loader, criterion, optimizer, device)
# val_loss, val_accuracy = evaluate(model, val_loader, criterion, device)
# Dummy training/evaluation for structure illustration
print(f" Epoch {epoch+1}/{num_epochs_per_trial} - Simulating training...")
# In a real run, update best_val_accuracy based on evaluate() results
# For this example, let's simulate a result
simulated_val_accuracy = 0.3 + trial*0.01 + epoch*0.02 + random.uniform(-0.05, 0.05) # Placeholder
best_val_accuracy = max(best_val_accuracy, simulated_val_accuracy)
print(f"Trial {trial+1} finished. Best Validation Accuracy: {best_val_accuracy:.4f}")
# 4. Log Results
results.append({
'trial': trial + 1,
'lr': lr,
'weight_decay': weight_decay,
'dropout_rate': dropout_rate,
'batch_size': batch_size,
'best_val_accuracy': best_val_accuracy
})
# 5. Analyze Results (see next section)
print("\n--- Tuning Complete ---")
# Sort results by validation accuracy
results.sort(key=lambda x: x['best_val_accuracy'], reverse=True)
print("Top 5 configurations:")
for i in range(min(5, len(results))):
print(f"Rank {i+1}: Acc={results[i]['best_val_accuracy']:.4f}, "
f"LR={results[i]['lr']:.6f}, WD={results[i]['weight_decay']:.6f}, "
f"Dropout={results[i]['dropout_rate']:.4f}, BS={results[i]['batch_size']}")
Note: The train_one_epoch
and evaluate
functions are standard PyTorch training components and are omitted here for brevity. You would implement them as usual.
After running the tuning loop, the results
list contains the performance for each hyperparameter configuration. Simply sorting by validation accuracy gives you the best-performing sets found during the search.
Visualizing the relationship between hyperparameters and performance can provide insights. For instance, let's plot validation accuracy against the learning rate (on a log scale):
Validation accuracy achieved by different randomly sampled learning rates after 10 training epochs. Values between roughly 10−3 and 3×10−3 seem to perform best in this simulated run.
Similar plots can be made for weight decay and dropout rate. You might observe, for example, that very low or very high dropout rates hurt performance, or that a moderate amount of weight decay is beneficial. Analyzing the top-performing trials can help you understand which hyperparameter values (or ranges) are most promising.
This hands-on practice demonstrates a fundamental workflow for improving deep learning models. While automated tools for hyperparameter optimization exist (e.g., Optuna, Ray Tune), understanding the manual process provides valuable intuition for setting up search spaces and interpreting results effectively. Experimentation is essential, so try adapting this process to your own models and datasets.
© 2025 ApX Machine Learning