GNN architectures are powerful feature extractors, transforming each node's structural and attribute information into a dense vector embedding. These embeddings, often denoted as matrix , contain high-level representations that are far more useful for downstream tasks than the raw input features. However, the GNN itself does not directly output class labels. To perform a task like node classification, a final component must be added to a GNN model that maps these learned embeddings to class predictions.
This final component is often called a classification head. For most common GNN applications, this is simply a standard feed-forward neural network that takes the node embeddings as input. In its simplest and most common form, the classification head is a single linear layer without any additional hidden layers or non-linearities.
The purpose of this linear layer is to act as a trainable classifier. It takes an embedding of dimension , where is the output dimension of our GNN encoder, and projects it into a vector of size , where is the total number of classes in our dataset. Each element in this output vector represents a raw, unnormalized score for a particular class. These scores are typically called logits.
The full model pipeline for node classification can be visualized as a two-stage process:
This structure allows the model to learn both the graph representation and the classification task simultaneously during training.
The end-to-end architecture for node classification. The GNN encoder generates embeddings, which are then passed to a simple linear classifier to produce the final class logits.
Let's translate this architecture into a Python class using PyTorch. Assume we are building a two-layer Graph Convolutional Network (GCN) for classification. The model class would contain the GCN layers for encoding and a standard torch.nn.Linear layer for classification.
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
class GCNNodeClassifier(nn.Module):
"""A two-layer GCN model for node classification."""
def __init__(self, in_channels, hidden_channels, num_classes):
super(GCNNodeClassifier, self).__init__()
# GNN Encoder layers
self.conv1 = GCNConv(in_channels, hidden_channels)
self.conv2 = GCNConv(hidden_channels, hidden_channels)
# Classification head
self.classifier = nn.Linear(hidden_channels, num_classes)
def forward(self, x, edge_index):
# 1. GNN Encoder: Obtain node embeddings
# First GCN layer
h = self.conv1(x, edge_index)
h = F.relu(h)
# Second GCN layer
h = self.conv2(h, edge_index)
# Final node embeddings are now in 'h'
# 2. Classification Head: Produce logits
output = self.classifier(h)
return output
In this implementation:
in_channels: The dimensionality of the input node features (e.g., 1433 for the Cora dataset).hidden_channels: The dimensionality of the node embeddings produced by the GNN layers. This is a hyperparameter you can tune.num_classes: The number of distinct node labels in your dataset (e.g., 7 for the Cora dataset).forward(self, x, edge_index): This method defines the computation flow. The input node features x and graph structure edge_index are passed through the two GCN layers, with a ReLU activation function applied in between. The resulting embeddings h are then passed to the self.classifier layer to get the final logits.A significant aspect of many node classification tasks is that they operate in a semi-supervised (or more accurately, transductive) setting. This means that while we have labels for only a small subset of nodes (the training set), the GNN encoder uses the entire graph structure, including all nodes and edges, to generate embeddings.
In a transductive setting, the GNN model has access to the features and connections of all nodes in the graph during training, even those in the validation and test sets. The model's task is to predict the labels for the unlabeled nodes within this seen graph.
The GCNNodeClassifier we defined above is built for this. Its forward method computes embeddings and logits for every single node in the graph. In the next section, when we discuss loss functions, we will see how to use a mask to ensure that the model's error is only calculated based on the predictions for the labeled training nodes.
With this end-to-end model structure in place, our next task is to define an objective function to measure its performance and guide its learning process. This brings us to the topic of loss functions.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with