Practice: A Simple GNN Layer with NumPy

Implement a simplified Graph Neural Network (GNN) layer from scratch using only Python and NumPy. This involves translating the core aggregation and update steps of the message passing mechanism directly into code. The goal is to make the mechanics of a GNN layer tangible through practical application.

Our Example Graph

To begin, we need a graph. Let's use a small, simple graph with four nodes and four edges. This size is perfect for inspecting the matrices and verifying our calculations by hand.

A simple undirected graph with four nodes and their connections.

Our goal is to write a function that takes this graph's structure and initial node features and computes updated node features after one layer of message passing.

Representing the Graph with NumPy

First, we represent the graph's components, its structure and features, as NumPy arrays. As we learned in Chapter 1, this involves an adjacency matrix A and a node feature matrix X (which we can also call H for hidden states).

Let's assume our nodes have initial features of size 2. For instance, this could be the result of some initial feature extraction or embedding process.

import numpy as np

# Adjacency Matrix (A)
# Represents the connections between nodes. A[i, j] = 1 if node i and j are connected.
A = np.array([
    [0, 1, 1, 0],
    [1, 0, 1, 0],
    [1, 1, 0, 1],
    [0, 0, 1, 0]
])

# Node Feature Matrix (X or H)
# Each row corresponds to a node, and each column is a feature.
# Here, we have 4 nodes and 2 features per node.
X = np.array([
    [0.1, 0.2],
    [0.3, 0.4],
    [0.5, 0.6],
    [0.7, 0.8]
])

# Let's also define a weight matrix for our GNN layer.
# The layer will transform the 2-dimensional features into 3-dimensional ones.
# Input dimension = 2, Output dimension = 3
W = np.random.rand(2, 3)

print("Adjacency Matrix (A):\n", A)
print("\nNode Features (X):\n", X)
print("\nWeights (W):\n", W)

Step 1: The Aggregation Step

The core idea of message passing is to aggregate information from a node's neighbors. A straightforward and powerful way to perform this aggregation for all nodes at once is through matrix multiplication. Multiplying the adjacency matrix A by the feature matrix X gives us exactly what we need.

\mathbf{M} = \mathbf{A} \mathbf{X}

Let's see what this operation produces:

# Aggregate neighbor features
M = A @ X

print("Aggregated Features (M = A @ X):\n", M)

The resulting matrix M has the same dimensions as X. Each row M[i] is the sum of the feature vectors of the neighbors of node i. For example, node 0 is connected to nodes 1 and 2. Therefore, the first row of M is the sum of the feature vectors for node 1 ([0.3, 0.4]) and node 2 ([0.5, 0.6]), which results in [0.8, 1.0]. This single matrix multiplication efficiently performs the AGGREGATE step for all nodes in parallel.

A More Effective Aggregation

The simple aggregation A @ X has two shortcomings:

Missing Self-Information: The aggregated features for a node do not include its own feature vector. A node's current state is often the most important piece of information for predicting its next state.
Degree Bias: Nodes with a high number of neighbors (high degree) will have aggregated feature vectors with much larger magnitudes, while low-degree nodes will have smaller ones. This can lead to unstable training and make the model sensitive to graph structure in undesirable ways.

We can solve both issues with two simple modifications.

Adding Self-Loops

To include a node's own features in the aggregation, we simply add a self-loop to every node. In terms of the adjacency matrix, this means adding an identity matrix I.

\hat{\mathbf{A}} = \mathbf{A} + \mathbf{I}

# Add self-loops to the adjacency matrix
A_hat = A + np.eye(A.shape[0])

print("Adjacency Matrix with Self-Loops (A_hat):\n", A_hat)

Now, when we multiply A_hat @ X, each node's aggregation will include its own features along with its neighbors'.

Normalizing Features

To address the degree bias, we can normalize the aggregated features. A common technique, popularized by Graph Convolutional Networks (GCNs), is to divide the features by the degree of each node. This effectively computes the mean of the neighbor features instead of the sum.

The symmetric normalization formula is:

\mathbf{A}_{\text{norm}} = \hat{\mathbf{D}}^{-1/2} \hat{\mathbf{A}} \hat{\mathbf{D}}^{-1/2}

Here, $\hat{\mathbf{A}}$ is the adjacency matrix with self-loops, and $\hat{\mathbf{D}}$ is the diagonal degree matrix computed from $\hat{\mathbf{A}}$ .

Let's implement this:

# Compute the Degree Matrix (D_hat)
D_hat = np.diag(np.sum(A_hat, axis=1))

print("Degree Matrix (D_hat):\n", D_hat)

# Compute the inverse square root of the degree matrix
# We add a small epsilon to avoid division by zero for isolated nodes
D_hat_inv_sqrt = np.linalg.inv(D_hat)**0.5

# Normalize the adjacency matrix
A_norm = D_hat_inv_sqrt @ A_hat @ D_hat_inv_sqrt

print("\nNormalized Adjacency Matrix:\n", A_norm)

This normalized adjacency matrix A_norm can now be used to perform a more stable and effective aggregation.

Step 2: The Update Step

After aggregation, the UPDATE step applies a learnable linear transformation followed by a non-linear activation function. This is identical to the operation inside a standard dense layer of a neural network.

The complete formula for our simple GNN layer is:

\mathbf{H}^{(l+1)} = \sigma \left( \hat{\mathbf{D}}^{-1/2} \hat{\mathbf{A}} \hat{\mathbf{D}}^{-1/2} \mathbf{H}^{(l)} \mathbf{W}^{(l)} \right)

Where $\sigma$ is a non-linear activation function like ReLU.

Putting It All Together in a Function

Let's encapsulate this logic into a reusable Python function. This function represents one complete message passing layer.

def gnn_layer(A, X, W):
    """
    Performs one layer of Graph Neural Network message passing.

    Args:
        A (np.array): The adjacency matrix of the graph.
        X (np.array): The node feature matrix.
        W (np.array): The weight matrix for the layer.

    Returns:
        np.array: The updated node feature matrix.
    """
    # 1. Add self-loops to the adjacency matrix
    A_hat = A + np.eye(A.shape[0])

    # 2. Compute the degree matrix
    D_hat = np.diag(np.sum(A_hat, axis=1))

    # 3. Compute the inverse square root of the degree matrix
    D_hat_inv_sqrt = np.linalg.inv(D_hat)**0.5

    # 4. Normalize the adjacency matrix
    A_norm = D_hat_inv_sqrt @ A_hat @ D_hat_inv_sqrt

    # 5. Perform the aggregation and update steps
    # (A_norm @ X) is the aggregation of neighbor features
    # (... @ W) is the linear transformation (update)
    H_prime = A_norm @ X @ W

    # 6. Apply a non-linear activation function (ReLU)
    H_next = np.maximum(0, H_prime)

    return H_next

# Run our graph data through the GNN layer
H_next = gnn_layer(A, X, W)

print("Original Node Features (X):\n", X)
print("\nUpdated Node Features after one GNN layer (H_next):\n", H_next)

The output H_next is a 4x3 matrix. Each row is a new feature vector for the corresponding node. This new vector was computed by taking a normalized average of the features of itself and its neighbors, and then transforming that result into a new, higher-dimensional space. This process has successfully updated each node's representation based on its local neighborhood.

By implementing this layer from scratch, you have built the fundamental component of many powerful GNN architectures. In the next chapter, we will see how this exact logic forms the basis for the Graph Convolutional Network (GCN) and explore other variations on the AGGREGATE and UPDATE functions.

Was this section helpful?

References

Semi-Supervised Classification with Graph Convolutional Networks, Thomas N. Kipf and Max Welling, 2017 International Conference on Learning Representations (ICLR 2017) DOI: 10.48550/arXiv.1609.02907 - This foundational paper introduced the Graph Convolutional Network (GCN) architecture, whose core layer, including the symmetric normalization and message passing scheme, is implemented in this section.
Graph Representation Learning, William L. Hamilton, 2020 Synthesis Lectures on Artificial Intelligence and Machine Learning, Vol. 14 (Morgan & Claypool Publishers) DOI: 10.2200/S01045ED1V01Y202009AIM046 - Provides a comprehensive theoretical foundation for graph representation learning, offering detailed discussions on message passing neural networks and various GNN architectures, contextualizing the GCN layer implemented.
torch_geometric.nn.conv.GCNConv, PyTorch Geometric Developers, 2024 - Official documentation for the Graph Convolutional Network (GCN) layer implementation within the PyTorch Geometric library, illustrating how the core logic translates to a high-performance deep learning framework.
A Comprehensive Survey on Graph Neural Networks, Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, Philip S. Yu, 2020 IEEE Transactions on Neural Networks and Learning Systems, Vol. 32 (IEEE) DOI: 10.1109/TNNLS.2020.2978386 - This survey offers a broad overview of Graph Neural Networks, categorizing different architectures and detailing the general message passing framework that underlies many GNNs, including the GCN implemented here.