The training loop brings together the theoretical principles behind Graph Neural Networks (GNNs). This is where the model learns by repeatedly processing the data, calculating its error, and adjusting its parameters to minimize that error. A complete Python script will be provided to train and evaluate a GCN model. We will perform a semi-supervised node classification task, which is a common application for GNNs.
Our goal is to train the model to predict the community of each node in a social network, given the labels of only a small subset of nodes. This process directly applies the concepts of loss functions, backpropagation, and transductive data splitting.
Before we can train, we need three things: a GCN model, a graph dataset, and a way to split that data for training and evaluation.
Let's assume we have the GCN model defined in a file named model.py, which we built in the previous chapter. For this exercise, we'll use a two-layer GCN. It takes an input feature matrix X and a normalized adjacency matrix  and outputs class logits for each node.
# A simplified GCN model for context
import torch
import torch.nn as nn
import torch.nn.functional as F
class GCN(nn.Module):
def __init__(self, in_features, hidden_features, num_classes):
super(GCN, self).__init__()
# In a real implementation, these would be proper GCN layers
self.fc1 = nn.Linear(in_features, hidden_features)
self.fc2 = nn.Linear(hidden_features, num_classes)
self.dropout = nn.Dropout(p=0.5)
def forward(self, x, adj):
# A simplified propagation rule: Â * X * W
# We assume `adj` is the normalized adjacency matrix
x = torch.spmm(adj, x)
x = F.relu(self.fc1(x))
x = self.dropout(x)
x = self.fc2(x)
return F.log_softmax(x, dim=1)
For our data, we'll use a simple, well-known graph: Zachary's Karate Club. This dataset represents a social network of 34 members of a karate club, with edges representing friendships. The task is to classify each member into one of four communities. We will use an identity matrix for the node features, a common practice when intrinsic node features are not available.
The most important step for our training setup is creating masks to separate our data. In a transductive setting, the model sees the entire graph's structure (all nodes and edges) during training, but it only uses the labels of the training nodes to learn. We create boolean tensors, train_mask and test_mask, to select which nodes to use for loss calculation and which for performance evaluation.
Let's assemble the full script. We will perform the following steps:
NLLLoss) because our model's final layer is log_softmax.Here is the complete code to train and evaluate our GCN.
import torch
import torch.optim as optim
import torch.nn.functional as F
import numpy as np
# Assume 'load_karate_club_data' is a helper function that returns:
# adj: Adjacency matrix (scipy sparse matrix)
# features: Node features (torch.FloatTensor)
# labels: Node labels (torch.LongTensor)
# train_mask, test_mask: Boolean tensors (torch.Tensor)
from utils import load_karate_club_data
# A placeholder GCN model class
from model import GCN
# 1. Load and prepare data
adj, features, labels, train_mask, test_mask = load_karate_club_data()
# Model and optimizer
num_features = features.shape[1]
num_classes = labels.max().item() + 1
model = GCN(in_features=num_features, hidden_features=16, num_classes=num_classes)
optimizer = optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
# Function to evaluate the model
def evaluate(model, features, adj, labels, mask):
model.eval()
with torch.no_grad():
logits = model(features, adj)
logits = logits[mask]
labels = labels[mask]
_, indices = torch.max(logits, dim=1)
correct = torch.sum(indices == labels)
return correct.item() * 1.0 / len(labels)
# 2. The Training Loop
print("Starting training...")
for epoch in range(200):
model.train()
optimizer.zero_grad()
# Forward pass
output = model(features, adj)
# Calculate loss only on training nodes
loss_train = F.nll_loss(output[train_mask], labels[train_mask])
# Backward pass and optimization
loss_train.backward()
optimizer.step()
# Evaluate on the test set every 10 epochs
if (epoch + 1) % 10 == 0:
acc_test = evaluate(model, features, adj, labels, test_mask)
print(f'Epoch: {epoch+1:03d}, Loss: {loss_train.item():.4f}, Test Accuracy: {acc_test:.4f}')
print("Training finished.")
# Final evaluation on the test set
final_accuracy = evaluate(model, features, adj, labels, test_mask)
print(f"Final Test Accuracy: {final_accuracy:.4f}")
When you run the script, you should see output that looks something like this:
Starting training...
Epoch: 010, Loss: 1.2567, Test Accuracy: 0.5882
Epoch: 020, Loss: 1.0453, Test Accuracy: 0.7059
Epoch: 030, Loss: 0.8122, Test Accuracy: 0.7647
...
Epoch: 190, Loss: 0.1534, Test Accuracy: 0.9412
Epoch: 200, Loss: 0.1489, Test Accuracy: 0.9412
Training finished.
Final Test Accuracy: 0.9412
Notice two important trends:
We can visualize this process to get a clearer picture of the model's learning dynamics over the epochs.
The training loss decreases as the model fits the data, while the test accuracy increases and stabilizes, indicating successful generalization.
This visualization confirms our observations. The model learns quickly in the initial epochs and then fine-tunes its parameters, eventually converging to a state with low training loss and high test accuracy. The use of dropout, controlled by model.train() and model.eval(), is what helps prevent the model from simply memorizing the training data and allows it to generalize well.
You have now successfully implemented a full training and evaluation pipeline for a GNN. You have taken a model architecture, trained it on a graph, and measured its ability to make predictions on unseen data. While this implementation was done from scratch to demonstrate the mechanics, specialized libraries can make this process much more efficient. In the next chapter, we will see how to achieve the same result with significantly less code using PyTorch Geometric.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with