To enable models to learn from data, a specific mechanism is essential, particularly after the model architecture and loss function have been established. This mechanism is known as the training loop, an iterative process where data is repeatedly presented to the model, errors are calculated, and the model's parameters are adjusted to minimize those errors. For Graph Neural Networks (GNNs) operating on a single graph in a full-batch setting, this loop orchestrates the learning process across multiple cycles, or epochs.
Each cycle in the training loop consists of a few distinct, sequential steps that work together to refine the model's internal weights. Let's break down this fundamental process.
A single pass through the training data, often called a training step or iteration, involves four main operations: the forward pass, loss computation, backpropagation, and parameter update.
The process begins with the forward pass, where we feed the graph data into the GNN. The model takes the node features, X, and the graph structure, represented by the adjacency information (or edge index), as input. It then processes this information through its layers, performing neighborhood aggregation and updates at each layer. The output of the final layer is typically a set of raw, unnormalized predictions for each node, often called logits.
# graph contains node features (x) and edge structure (edge_index)
logits = model(graph.x, graph.edge_index)
These logits represent the model's current belief about the class of each node before any normalization like a softmax function is applied.
Next, we quantify how wrong the model's predictions are. We compare the output logits from the forward pass with the true, ground-truth labels. This comparison is handled by the loss function we selected earlier, such as CrossEntropyLoss for node classification.
A significant detail in GNN training is that we often compute the loss only on the nodes designated for training. This is managed using a boolean mask. The mask filters the logits and labels, ensuring that only the training nodes contribute to the loss calculation. This is essential in a semi-supervised or transductive setting where we have labels for only a subset of nodes in the graph.
L = \text{loss_function}(\text{logits}[\text{train_mask}], \text{labels}[\text{train_mask}])The result is a single scalar value, the loss, which represents the model's error for the current batch of training data.
With the loss calculated, we need to determine how each of the model's parameters contributed to this error. This is the job of backpropagation. Deep learning frameworks like PyTorch automate this complex process. By calling loss.backward(), the framework computes the gradient of the loss with respect to every learnable parameter (weights and biases) in the model.
These gradients indicate the direction and magnitude of the change needed for each parameter to reduce the loss. A positive gradient for a weight means that increasing the weight increases the loss, while a negative gradient means the opposite.
The final step is to use these gradients to update the model's parameters. This is handled by the optimizer, such as Adam or SGD. The optimizer's step() method adjusts each parameter in the direction that minimizes the loss, scaled by a learning rate.
Before starting the next training step, it's necessary to reset the gradients. This is because deep learning frameworks accumulate gradients by default. We call optimizer.zero_grad() to clear the old gradients, ensuring that the parameter update for the current step is not influenced by gradients from previous steps.
This entire four-step sequence forms a single training iteration.
The iterative cycle of a single training step. Data flows through the model to produce predictions, which are used to calculate a loss. Gradients derived from this loss guide the optimizer in updating the model's weights for the next iteration.
An epoch is defined as one full pass over the entire training dataset. In the full-batch training common for smaller graphs, a single training step as described above processes the entire graph at once. Therefore, one training step is equivalent to one epoch.
The training loop runs this process for a set number of epochs. With each epoch, the optimizer nudges the model's weights closer to values that minimize the training loss, effectively teaching the model to recognize patterns in the graph structure and node features.
Here is what a typical training loop looks like in code for a GNN:
# model: your GNN model
# graph: your graph data object
# optimizer: an optimizer like Adam
# loss_fn: a loss function like CrossEntropyLoss
def train(model, graph, optimizer, loss_fn):
model.train() # Set the model to training mode
optimizer.zero_grad() # Clear previous gradients
# 1. Forward Pass
logits = model(graph.x, graph.edge_index)
# 2. Loss Calculation (using the training mask)
loss = loss_fn(logits[graph.train_mask], graph.y[graph.train_mask])
# 3. Backpropagation
loss.backward()
# 4. Parameter Update
optimizer.step()
return loss.item()
# The main loop over epochs
for epoch in range(200):
loss = train(model, graph, optimizer, loss_fn)
print(f"Epoch {epoch+1:03d}, Loss: {loss:.4f}")
As the loop progresses, we expect to see the training loss decrease. This indicates that the model is successfully learning to fit the training data. However, a decreasing training loss alone is not enough. We must also monitor the model's performance on unseen data, which is the role of the validation set, to ensure it is generalizing well and not just memorizing the training examples.
Was this section helpful?
torch.autograd module for automatic differentiation, which is crucial for backpropagation, and the torch.optim package, which provides various optimization algorithms for parameter updates.© 2026 ApX Machine LearningEngineered with