Setting up a GNN for Node Classification

GNN architectures are powerful feature extractors, transforming each node's structural and attribute information into a dense vector embedding. These embeddings, often denoted as matrix $Z$ , contain high-level representations that are far more useful for downstream tasks than the raw input features. However, the GNN itself does not directly output class labels. To perform a task like node classification, a final component must be added to a GNN model that maps these learned embeddings to class predictions.

This final component is often called a classification head. For most common GNN applications, this is simply a standard feed-forward neural network that takes the node embeddings as input. In its simplest and most common form, the classification head is a single linear layer without any additional hidden layers or non-linearities.

The purpose of this linear layer is to act as a trainable classifier. It takes an embedding of dimension $d$ , where $d$ is the output dimension of our GNN encoder, and projects it into a vector of size $C$ , where $C$ is the total number of classes in our dataset. Each element in this output vector represents a raw, unnormalized score for a particular class. These scores are typically called logits.

The full model pipeline for node classification can be visualized as a two-stage process:

Encoding: The GNN encoder, composed of one or more message passing layers, processes the input graph ( $X$ , $A$ ) to produce a final node embedding matrix $Z$ . Each row $z_i$ in $Z$ is the embedding for node $i$ .
Classification: The classification head, a linear layer, takes the embedding matrix $Z$ and produces a logit matrix. Each row in this matrix contains the class scores for a corresponding node.

This structure allows the model to learn both the graph representation and the classification task simultaneously during training.

The end-to-end architecture for node classification. The GNN encoder generates embeddings, which are then passed to a simple linear classifier to produce the final class logits.

Defining the Model in Code

Let's translate this architecture into a Python class using PyTorch. Assume we are building a two-layer Graph Convolutional Network (GCN) for classification. The model class would contain the GCN layers for encoding and a standard torch.nn.Linear layer for classification.

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch_geometric.nn import GCNConv

class GCNNodeClassifier(nn.Module):
    """A two-layer GCN model for node classification."""
    def __init__(self, in_channels, hidden_channels, num_classes):
        super(GCNNodeClassifier, self).__init__()

        # GNN Encoder layers
        self.conv1 = GCNConv(in_channels, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, hidden_channels)

        # Classification head
        self.classifier = nn.Linear(hidden_channels, num_classes)

    def forward(self, x, edge_index):
        # 1. GNN Encoder: Obtain node embeddings
        # First GCN layer
        h = self.conv1(x, edge_index)
        h = F.relu(h)

        # Second GCN layer
        h = self.conv2(h, edge_index)
        # Final node embeddings are now in 'h'

        # 2. Classification Head: Produce logits
        output = self.classifier(h)

        return output

In this implementation:

in_channels: The dimensionality of the input node features (e.g., 1433 for the Cora dataset).
hidden_channels: The dimensionality of the node embeddings produced by the GNN layers. This is a hyperparameter you can tune.
num_classes: The number of distinct node labels in your dataset (e.g., 7 for the Cora dataset).
forward(self, x, edge_index): This method defines the computation flow. The input node features x and graph structure edge_index are passed through the two GCN layers, with a ReLU activation function applied in between. The resulting embeddings h are then passed to the self.classifier layer to get the final logits.

The Semi-Supervised Setting

A significant aspect of many node classification tasks is that they operate in a semi-supervised (or more accurately, transductive) setting. This means that while we have labels for only a small subset of nodes (the training set), the GNN encoder uses the entire graph structure, including all nodes and edges, to generate embeddings.

In a transductive setting, the GNN model has access to the features and connections of all nodes in the graph during training, even those in the validation and test sets. The model's task is to predict the labels for the unlabeled nodes within this seen graph.

The GCNNodeClassifier we defined above is built for this. Its forward method computes embeddings and logits for every single node in the graph. In the next section, when we discuss loss functions, we will see how to use a mask to ensure that the model's error is only calculated based on the predictions for the labeled training nodes.

With this end-to-end model structure in place, our next task is to define an objective function to measure its performance and guide its learning process. This brings us to the topic of loss functions.

Was this section helpful?

References

Semi-Supervised Classification with Graph Convolutional Networks, Thomas N. Kipf and Max Welling, 2017 International Conference on Learning Representations (ICLR 2017) DOI: 10.48550/arXiv.1609.02907 - The foundational paper introducing Graph Convolutional Networks (GCNs), which are central to the example code and the semi-supervised node classification setup described.
Graph Representation Learning, William L. Hamilton, 2020 (Morgan & Claypool Publishers) DOI: 10.2200/S01045ED1V01Y202009AIM046 - An authoritative textbook covering fundamental concepts of graph representation learning, including GNNs and their application to node classification.
A Comprehensive Survey on Graph Neural Networks, Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Jing Jiang, and Chengqi Zhang, 2021 IEEE Transactions on Neural Networks and Learning Systems, Vol. 32 (IEEE) DOI: 10.1109/TNNLS.2020.2977118 - A broad survey providing an overview of various GNN architectures and their applications, useful for understanding the broader context of GNNs in node classification.