Now that you understand the individual components of training, let's integrate them into a complete, working example. This practical exercise will guide you through setting up a model, preparing data, and implementing both the training and evaluation loops, solidifying the concepts discussed in this chapter. We will also touch upon saving the trained model state.
For this exercise, we'll tackle a simple linear regression problem using synthetic data. Our goal is to train a model to learn the relationship yā2x+1.
First, let's import the necessary PyTorch modules and define some basic hyperparameters for our training process.
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader
# Hyperparameters
learning_rate = 0.01
num_epochs = 100
batch_size = 16
# Device configuration (use GPU if available)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
We'll generate synthetic data for a linear relationship and wrap it using TensorDataset
and DataLoader
.
# Generate synthetic data: y = 2x + 1 + noise
true_weight = torch.tensor([[2.0]])
true_bias = torch.tensor([1.0])
# Generate training data
X_train_tensor = torch.randn(100, 1) * 5 # 100 examples, 1 feature
y_train_tensor = true_weight * X_train_tensor + true_bias + torch.randn(100, 1) * 0.5 # Add some noise
# Generate validation data (separate set)
X_val_tensor = torch.randn(20, 1) * 5 # 20 examples, 1 feature
y_val_tensor = true_weight * X_val_tensor + true_bias + torch.randn(20, 1) * 0.5
# Create datasets
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
val_dataset = TensorDataset(X_val_tensor, y_val_tensor)
# Create dataloaders
train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(dataset=val_dataset, batch_size=batch_size, shuffle=False) # No need to shuffle validation data
Here, TensorDataset
conveniently wraps our input features (X
) and target labels (y
) tensors. DataLoader
then takes this dataset and provides iterable batches, handling shuffling and batching automatically.
Now, define the model architecture, the loss function, and the optimizer. Since we are modeling a linear relationship y=wx+b, a single linear layer is sufficient.
# Define the model (a simple linear layer)
# Input feature size = 1, Output feature size = 1
model = nn.Linear(1, 1).to(device) # Move model to the selected device
# Define the loss function (Mean Squared Error for regression)
loss_fn = nn.MSELoss()
# Define the optimizer (Stochastic Gradient Descent)
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
print("Model definition:")
print(model)
print("\nInitial parameters:")
for name, param in model.named_parameters():
if param.requires_grad:
print(f"{name}: {param.data.squeeze()}")
We instantiate nn.Linear
which represents the operation y=Wx+b. PyTorch automatically initializes the weight (W) and bias (b) parameters. We use Mean Squared Error (nn.MSELoss
) as it's standard for regression tasks, measuring the average squared difference between predictions and true values. Stochastic Gradient Descent (optim.SGD
) is chosen to update the model's parameters based on the computed gradients. Notice we pass model.parameters()
to the optimizer so it knows which tensors to update. Finally, we move the model to the configured device (CPU or GPU).
This is the core of the process where the model learns from the data iteratively.
print("\nStarting Training...")
for epoch in range(num_epochs):
model.train() # Set the model to training mode
running_loss = 0.0
num_batches = 0
# Iterate over batches from the DataLoader
for i, (features, labels) in enumerate(train_loader):
# Move batch data to the same device as the model
features = features.to(device)
labels = labels.to(device)
# 1. Forward pass: Compute model's predictions
outputs = model(features)
# 2. Calculate the loss
loss = loss_fn(outputs, labels)
# 3. Backward pass: Compute gradients
# First, zero the gradients from the previous step
optimizer.zero_grad()
# Then, perform backpropagation
loss.backward()
# 4. Optimizer step: Update model weights
optimizer.step()
# Accumulate loss for reporting
running_loss += loss.item()
num_batches += 1
# Print average loss for the epoch
avg_epoch_loss = running_loss / num_batches
if (epoch + 1) % 10 == 0: # Print every 10 epochs
print(f"Epoch [{epoch+1}/{num_epochs}], Training Loss: {avg_epoch_loss:.4f}")
print("Training Finished!")
Let's break down the steps inside the epoch loop:
model.train()
: Sets the model to training mode. This is important for layers like Dropout or BatchNorm which behave differently during training and evaluation.train_loader
to get batches of features
and labels
.device
where the model resides. This prevents runtime errors.outputs = model(features)
calculates the model's predictions for the input batch.loss = loss_fn(outputs, labels)
computes the difference between predictions and actual labels using the MSE criterion.optimizer.zero_grad()
: Clears old gradients. If you forget this, gradients will accumulate from previous iterations, leading to incorrect updates.loss.backward()
: Computes the gradient of the loss with respect to all model parameters that have requires_grad=True
.optimizer.step()
updates the model's parameters (model.parameters()
) using the gradients computed in the backward pass and the optimization algorithm (SGD in this case).running_loss
to report the average loss for the epoch.After training (or periodically during training, e.g., after each epoch), we need to evaluate the model's performance on unseen data (the validation set) without updating its weights.
print("\nStarting Evaluation...")
model.eval() # Set the model to evaluation mode
total_val_loss = 0.0
num_val_batches = 0
# Disable gradient calculations for evaluation
with torch.no_grad():
for features, labels in val_loader:
# Move batch data to the device
features = features.to(device)
labels = labels.to(device)
# Forward pass
outputs = model(features)
# Calculate loss
loss = loss_fn(outputs, labels)
total_val_loss += loss.item()
num_val_batches += 1
avg_val_loss = total_val_loss / num_val_batches
print(f"Validation Loss: {avg_val_loss:.4f}")
# Inspect the learned parameters
print("\nLearned parameters:")
for name, param in model.named_parameters():
if param.requires_grad:
print(f"{name}: {param.data.squeeze()}")
print(f"(True weight: {true_weight.item():.4f}, True bias: {true_bias.item():.4f})")
Key differences in the evaluation loop:
model.eval()
: Sets the model to evaluation mode.with torch.no_grad():
: This context manager disables gradient calculation within the block. This is important because we don't need gradients for evaluation, and it reduces memory consumption and speeds up computation.loss.backward()
or optimizer.step()
because we are only measuring performance, not training.After evaluation, we print the learned parameters. Compare them to the true_weight
(2.0) and true_bias
(1.0) we used to generate the data. They should be reasonably close after 100 epochs.
Persisting your trained model is essential. The standard practice is to save the model's state_dict
, which contains all its learned parameters (weights and biases).
# Saving the model's learned parameters
model_save_path = 'linear_regression_model.pth'
torch.save(model.state_dict(), model_save_path)
print(f"\nModel state_dict saved to {model_save_path}")
# Example of loading the model state
# First, instantiate the model architecture again
loaded_model = nn.Linear(1, 1).to(device)
# Then, load the saved state dictionary
loaded_model.load_state_dict(torch.load(model_save_path))
print("Model state_dict loaded successfully.")
# Remember to set the loaded model to evaluation mode if using for inference
loaded_model.eval()
# You can now use loaded_model for predictions
# Example prediction with the loaded model:
with torch.no_grad():
sample_input = torch.tensor([[10.0]]).to(device) # Example input
prediction = loaded_model(sample_input)
print(f"Prediction for input 10.0: {prediction.item():.4f}")
# Expected output should be close to 2*10 + 1 = 21
Saving the state_dict
is generally preferred over saving the entire model object because it's more flexible and less prone to breaking if the underlying code changes. To load the state, you need to create an instance of the same model architecture first and then load the dictionary into it.
Here is the complete script combining all the parts:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader
# 1. Setup: Hyperparameters and Device
learning_rate = 0.01
num_epochs = 100
batch_size = 16
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
# 2. Data Preparation
true_weight = torch.tensor([[2.0]])
true_bias = torch.tensor([1.0])
X_train_tensor = torch.randn(100, 1, device=device) * 5 # Generate data directly on device
y_train_tensor = true_weight.to(device) * X_train_tensor + true_bias.to(device) + torch.randn(100, 1, device=device) * 0.5
X_val_tensor = torch.randn(20, 1, device=device) * 5
y_val_tensor = true_weight.to(device) * X_val_tensor + true_bias.to(device) + torch.randn(20, 1, device=device) * 0.5
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
val_dataset = TensorDataset(X_val_tensor, y_val_tensor)
train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(dataset=val_dataset, batch_size=batch_size, shuffle=False)
# 3. Model, Loss, and Optimizer
model = nn.Linear(1, 1).to(device)
loss_fn = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
print("Model definition:")
print(model)
print("\nInitial parameters:")
for name, param in model.named_parameters():
if param.requires_grad:
print(f"{name}: {param.data.squeeze()}")
# 4. Training Loop
print("\nStarting Training...")
for epoch in range(num_epochs):
model.train()
running_loss = 0.0
num_batches = 0
for i, (features, labels) in enumerate(train_loader):
# Data is already on the correct device
outputs = model(features)
loss = loss_fn(outputs, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
running_loss += loss.item()
num_batches += 1
avg_epoch_loss = running_loss / num_batches
if (epoch + 1) % 10 == 0:
print(f"Epoch [{epoch+1}/{num_epochs}], Training Loss: {avg_epoch_loss:.4f}")
print("Training Finished!")
# 5. Evaluation Loop
print("\nStarting Evaluation...")
model.eval()
total_val_loss = 0.0
num_val_batches = 0
with torch.no_grad():
for features, labels in val_loader:
# Data is already on the correct device
outputs = model(features)
loss = loss_fn(outputs, labels)
total_val_loss += loss.item()
num_val_batches += 1
avg_val_loss = total_val_loss / num_val_batches
print(f"Validation Loss: {avg_val_loss:.4f}")
print("\nLearned parameters:")
for name, param in model.named_parameters():
if param.requires_grad:
print(f"{name}: {param.data.squeeze().item():.4f}") # Use .item() for single values
print(f"(True weight: {true_weight.item():.4f}, True bias: {true_bias.item():.4f})")
# 6. Saving and Loading Model State
model_save_path = 'linear_regression_model.pth'
torch.save(model.state_dict(), model_save_path)
print(f"\nModel state_dict saved to {model_save_path}")
loaded_model = nn.Linear(1, 1).to(device)
loaded_model.load_state_dict(torch.load(model_save_path))
loaded_model.eval()
print("Model state_dict loaded successfully.")
with torch.no_grad():
sample_input = torch.tensor([[10.0]]).to(device)
prediction = loaded_model(sample_input)
print(f"Prediction for input 10.0: {prediction.item():.4f}")
(Note: In the combined script, data generation was slightly modified to create tensors directly on the target device
for efficiency, removing the need for .to(device)
inside the loops for batch data.)
This hands-on example demonstrates the fundamental structure for training virtually any model in PyTorch. You now have a template combining data loading, model definition, training iteration, evaluation, and persistence. You can adapt this structure for more complex models and datasets by changing the model architecture in step 3 and the data preparation in step 2. The core logic of the training and evaluation loops remains remarkably consistent.
Ā© 2025 ApX Machine Learning