Training a neural network is an iterative optimization process. You provide the model with data, measure how inaccurate its predictions are, and then adjust its internal parameters (weights and biases) slightly to reduce that inaccuracy. This cycle repeats many times. The code structure that manages this repetitive process is commonly referred to as the training loop.
At a high level, training usually involves two nested loops:
DataLoader
you learned about previously is responsible for providing these batches. Training in batches is memory-efficient and can also lead to more stable convergence and better generalization compared to processing samples one by one or using the entire dataset at once.For every batch processed within an epoch, the training loop executes a sequence of well-defined steps. Let's break down what happens in a typical iteration:
DataLoader
. It's also important at this stage to ensure the data is transferred to the correct computational device (CPU or GPU) where your model parameters reside.zero_grad()
method on your optimizer object.
# Conceptual code: Reset gradients before the new batch processing
optimizer.zero_grad()
# Conceptual code: Get model predictions
predictions = model(input_batch)
predictions
against the true target_batch
using your chosen loss function (criterion), such as nn.CrossEntropyLoss
for classification or nn.MSELoss
for regression. The loss function returns a single scalar value representing the average error or discrepancy for the current batch. This value indicates how well (or poorly) the model performed on this specific batch.
# Conceptual code: Compute the loss
loss = criterion(predictions, target_batch)
$loss.backward()
computes the gradient of the loss scalar with respect to every model parameter that has requires_grad=True
(which is the default for parameters within nn.Module
). These gradients represent the sensitivity of the loss to changes in each parameter; essentially, they tell the optimizer how to adjust each weight to decrease the loss.
# Conceptual code: Compute gradients via backpropagation
loss.backward()
$optimizer.step()
updates each parameter based on its computed gradient and the optimizer's specific algorithm (like SGD with momentum, Adam, etc.). The goal is to take a small step in the direction that minimizes the loss.
# Conceptual code: Update model parameters
optimizer.step()
These six steps form the core of one iteration within the training loop. This cycle is repeated for every batch provided by the DataLoader
. Once all batches have been processed, one epoch is complete, and the outer loop begins the next epoch, repeating the entire batch iteration process.
Flow diagram illustrating the sequence of operations within a single batch iteration of the PyTorch training loop.
© 2025 ApX Machine Learning