Transforming node embeddings into task-specific predictions is a crucial stage in Graph Neural Network applications. For tasks like node classification, this transformation typically involves a final linear layer. This layer maps each node's D-dimensional embedding to a C-dimensional vector, where C represents the number of classes. The resulting output vector contains raw, unnormalized scores, commonly known as logits, for each class.
To train the model, we need a way to measure how far these predictions are from the true labels. This is the role of a loss function (or objective function). The loss function computes a single scalar value that quantifies the model's error. The goal of training is to adjust the model's weights to minimize this value.
For multi-class node classification, the most common loss function is the Cross-Entropy Loss. This function is a standard choice for classification problems in deep learning and works exceptionally well for GNNs.
It operates in two stages:
Since only one yi is 1 (the true class) and the rest are 0, this formula simplifies to calculating the negative logarithm of the predicted probability for the correct class. A higher predicted probability for the correct class results in a lower loss, which is exactly what we want.
In practice, deep learning libraries like PyTorch and TensorFlow provide a single function, such as torch.nn.CrossEntropyLoss, that combines the Softmax activation and the cross-entropy calculation. Using this combined function is recommended as it provides better numerical stability than applying the two steps separately.
The process of calculating loss for a single node. The GNN's output embedding is passed through a classifier to get probabilities, which are then compared against the true label using a loss function.
Sometimes a node can belong to multiple categories simultaneously. For example, a research paper in a citation network might be about both "Graph Neural Networks" and "Reinforcement Learning". This is a multi-label classification problem, and Cross-Entropy Loss is not suitable because it assumes each node belongs to exactly one class.
For this scenario, the appropriate choice is Binary Cross-Entropy (BCE) Loss. The setup changes slightly:
Here, y is either 0 or 1 (the true label for that class), and y^ is the predicted probability from the sigmoid function. PyTorch provides this as torch.nn.BCEWithLogitsLoss, which combines the sigmoid and BCE calculation for better stability.
While we've focused on node classification, GNNs are applied to other tasks that require different loss functions.
Link Prediction: This task is often framed as a binary classification problem: for any pair of nodes, does an edge exist between them? You can take the final embeddings of two nodes, (hu,hv), combine them with an operator like a dot product, and pass the result through a sigmoid function to predict the probability of a link. The model would then be trained using Binary Cross-Entropy Loss against the true graph structure.
Graph Classification: In this task, the goal is to assign a label to an entire graph. A readout or pooling layer is used to aggregate all node embeddings into a single graph-level embedding, hG. This embedding is then fed into a standard classifier. If it's a multi-class problem, you would use Cross-Entropy Loss, just as in node classification. If it's a regression task (e.g., predicting a molecular property), you might use a regression loss like Mean Squared Error (MSE).
Choosing the right loss function is determined by the nature of your task's output. The principles are the same as in other deep learning domains; the main difference is that the predictions are derived from the unique structure of a GNN.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with