Hands-on: Implementing a GCN from Scratch

Translate the mathematical formulation of Graph Convolutional Networks into a working model. Solidify understanding of how the GCN's message passing mechanism functions at a practical level. Build a two-layer GCN using PyTorch, focusing on the main matrix operations that define the graph convolution.

This approach gives you a clear view of the model's inner workings before we move on to higher-level libraries in Chapter 5, which abstract away many of these details.

Setting Up the Components

To build a GCN, we need three primary components:

The Graph Structure: Represented by a normalized adjacency matrix.
Node Features: A matrix where each row corresponds to a node's feature vector.
The GCN Layers: The neural network modules that perform the feature transformation.

Let's begin by defining a simple, small graph and preparing its matrices for computation. We will use a graph with four nodes.

Step 1: Prepare the Graph Data

First, we define our graph's adjacency matrix A and an initial feature matrix X. Then, we perform the normalization procedure discussed in the "Graph Convolutional Networks (GCN)" section. The propagation rule for a GCN layer is:

H^{(l+1)} = \sigma(\hat{D}^{-\frac{1}{2}}\hat{A}\hat{D}^{-\frac{1}{2}} H^{(l)} W^{(l)})

Our first task is to compute the normalized adjacency matrix, $\hat{D}^{-\frac{1}{2}}\hat{A}\hat{D}^{-\frac{1}{2}}$ . We will use PyTorch for all tensor operations.

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

# 1. Define graph structure and features
adj = torch.tensor([
    [0., 1., 1., 0.],
    [1., 0., 1., 1.],
    [1., 1., 0., 0.],
    [0., 1., 0., 0.]
])

features = torch.tensor([
    [10., 2.],
    [5., 15.],
    [20., 10.],
    [1., 1.]
])

# 2. Add self-loops to the adjacency matrix
# A_hat = A + I
adj_hat = adj + torch.eye(adj.shape[0])

# 3. Calculate the degree matrix D_hat
# D_ii = Sum_j A_ij
degree_hat = torch.diag(torch.sum(adj_hat, dim=1))

# 4. Compute D_hat^(-1/2)
# Add a small epsilon for numerical stability
degree_hat_inv_sqrt = torch.pow(degree_hat.diag() + 1e-5, -0.5)
degree_hat_inv_sqrt = torch.diag(degree_hat_inv_sqrt)

# 5. Calculate the normalized adjacency matrix
# norm_A = D_hat^(-1/2) * A_hat * D_hat^(-1/2)
norm_adj = torch.matmul(torch.matmul(degree_hat_inv_sqrt, adj_hat), degree_hat_inv_sqrt)

print("Normalized Adjacency Matrix:")
print(norm_adj)

The output norm_adj is the matrix that will be used to propagate information between nodes. It is symmetric and its values are scaled based on node degrees. This matrix is constant throughout the model's training process and only needs to be computed once.

Step 2: Define a Single GCN Layer

Next, we create a custom PyTorch module for a single GCN layer. This class will contain one trainable weight matrix W and its forward method will execute the core multiplication: norm_adj @ features @ W.

class GCNLayer(nn.Module):
    """
    A single Graph Convolutional Network layer.
    """
    def __init__(self, in_features, out_features):
        super(GCNLayer, self).__init__()
        # Define the trainable weight matrix W
        self.weights = nn.Parameter(torch.FloatTensor(in_features, out_features))
        # Initialize weights with a standard method
        nn.init.xavier_uniform_(self.weights)

    def forward(self, adj, features):
        # First, transform the features: X * W
        support = torch.matmul(features, self.weights)
        # Then, propagate features: norm_A * (X * W)
        output = torch.matmul(adj, support)
        return output

This GCNLayer class cleanly encapsulates the logic. It takes the pre-computed normalized adjacency matrix and the node features as input and produces new node embeddings. The nn.Parameter wrapper ensures that PyTorch's autograd system tracks gradients for our weight matrix during training.

Step 3: Build the Full Two-Layer GCN Model

With the GCNLayer defined, stacking them to form a deep GNN is straightforward. We will create a two-layer GCN for node classification. This model will:

Pass the input features through the first GCN layer.
Apply a ReLU non-linearity to the output.
Pass the result through a second GCN layer to get the final outputs (logits).

The data flow through a two-layer GCN. The normalized adjacency matrix is passed to each layer to guide the aggregation of messages.

Here is the implementation of the full model:

class GCN(nn.Module):
    """
    A two-layer Graph Convolutional Network.
    """
    def __init__(self, in_features, hidden_features, out_features):
        super(GCN, self).__init__()
        self.gc1 = GCNLayer(in_features, hidden_features)
        self.gc2 = GCNLayer(hidden_features, out_features)

    def forward(self, adj, features):
        # First layer and ReLU activation
        x = F.relu(self.gc1(adj, features))
        # Second layer
        x = self.gc2(adj, x)
        return x

This GCN class combines two instances of our GCNLayer. The forward method defines the computation flow: the output of the first layer becomes the input to the second, with a ReLU activation in between to introduce non-linearity, which allows the model to learn more complex functions.

Running a Forward Pass

Now we can put everything together. We will instantiate our GCN model and pass our prepared graph data through it to get the final node embeddings. Let's assume our task is to classify each node into one of three categories.

# Get the dimensions from our data
num_nodes = features.shape[0]
in_features = features.shape[1]
hidden_features = 16  # A common choice for hidden layer size
out_features = 3      # Number of classes

# Instantiate the model
model = GCN(in_features, hidden_features, out_features)

# Perform a forward pass
logits = model(norm_adj, features)

print("\nModel Output (Logits):")
print(logits)
print("\nOutput Shape:", logits.shape)

The output of this script will be a tensor of shape [4, 3]. This represents the raw, unnormalized scores (logits) for each of the 4 nodes belonging to each of the 3 classes.

Model Output (Logits):
tensor([[-1.9965, -3.2057,  2.0163],
        [-2.3150, -4.5161,  3.7142],
        [-2.2965, -4.0189,  2.9785],
        [-1.3857, -2.9754,  2.4187]], grad_fn=<MmBackward0>)

Output Shape: torch.Size([4, 3])

We have successfully built a GCN from scratch. Each row in the output tensor is a new representation for a node, calculated by aggregating information from its local neighborhood as defined by the graph structure. In the next chapter, we will see how to use these logits with a loss function to train the model's weights and make accurate predictions for a task like node classification.

Was this section helpful?

References

Semi-Supervised Classification with Graph Convolutional Networks, Thomas N. Kipf, Max Welling, 2017 International Conference on Learning Representations (ICLR) DOI: 10.48550/arXiv.1609.02907 - The original research paper that introduced the Graph Convolutional Network (GCN) architecture and its core propagation rule, which is implemented in this section.
torch.nn - PyTorch 2.1 documentation, PyTorch Contributors, 2024 (PyTorch Foundation) - Official documentation for PyTorch's neural network modules, providing essential details on nn.Module, nn.Parameter, and building custom layers, which are fundamental to the GCN implementation.
CS224W: Machine Learning with Graphs, Jure Leskovec, 2024 - A highly regarded university course that offers comprehensive lecture materials and assignments on Graph Neural Networks, including detailed discussions and practical applications of GCNs.
Graph Representation Learning, William L. Hamilton, 2020 Synthesis Lectures on Artificial Intelligence and Machine Learning, Vol. 46 (Morgan & Claypool Publishers) DOI: 10.2200/S01045ED1V01Y202009AIM046 - A concise academic monograph that provides a theoretical and practical overview of methods for learning representations on graphs, with dedicated sections explaining Graph Convolutional Networks.