Implement a simplified Graph Neural Network (GNN) layer from scratch using only Python and NumPy. This involves translating the core aggregation and update steps of the message passing mechanism directly into code. The goal is to make the mechanics of a GNN layer tangible through practical application.
To begin, we need a graph. Let's use a small, simple graph with four nodes and four edges. This size is perfect for inspecting the matrices and verifying our calculations by hand.
A simple undirected graph with four nodes and their connections.
Our goal is to write a function that takes this graph's structure and initial node features and computes updated node features after one layer of message passing.
First, we represent the graph's components, its structure and features, as NumPy arrays. As we learned in Chapter 1, this involves an adjacency matrix A and a node feature matrix X (which we can also call H for hidden states).
Let's assume our nodes have initial features of size 2. For instance, this could be the result of some initial feature extraction or embedding process.
import numpy as np
# Adjacency Matrix (A)
# Represents the connections between nodes. A[i, j] = 1 if node i and j are connected.
A = np.array([
[0, 1, 1, 0],
[1, 0, 1, 0],
[1, 1, 0, 1],
[0, 0, 1, 0]
])
# Node Feature Matrix (X or H)
# Each row corresponds to a node, and each column is a feature.
# Here, we have 4 nodes and 2 features per node.
X = np.array([
[0.1, 0.2],
[0.3, 0.4],
[0.5, 0.6],
[0.7, 0.8]
])
# Let's also define a weight matrix for our GNN layer.
# The layer will transform the 2-dimensional features into 3-dimensional ones.
# Input dimension = 2, Output dimension = 3
W = np.random.rand(2, 3)
print("Adjacency Matrix (A):\n", A)
print("\nNode Features (X):\n", X)
print("\nWeights (W):\n", W)
The core idea of message passing is to aggregate information from a node's neighbors. A straightforward and powerful way to perform this aggregation for all nodes at once is through matrix multiplication. Multiplying the adjacency matrix A by the feature matrix X gives us exactly what we need.
Let's see what this operation produces:
# Aggregate neighbor features
M = A @ X
print("Aggregated Features (M = A @ X):\n", M)
The resulting matrix M has the same dimensions as X. Each row M[i] is the sum of the feature vectors of the neighbors of node i. For example, node 0 is connected to nodes 1 and 2. Therefore, the first row of M is the sum of the feature vectors for node 1 ([0.3, 0.4]) and node 2 ([0.5, 0.6]), which results in [0.8, 1.0]. This single matrix multiplication efficiently performs the AGGREGATE step for all nodes in parallel.
The simple aggregation A @ X has two shortcomings:
We can solve both issues with two simple modifications.
To include a node's own features in the aggregation, we simply add a self-loop to every node. In terms of the adjacency matrix, this means adding an identity matrix I.
# Add self-loops to the adjacency matrix
A_hat = A + np.eye(A.shape[0])
print("Adjacency Matrix with Self-Loops (A_hat):\n", A_hat)
Now, when we multiply A_hat @ X, each node's aggregation will include its own features along with its neighbors'.
To address the degree bias, we can normalize the aggregated features. A common technique, popularized by Graph Convolutional Networks (GCNs), is to divide the features by the degree of each node. This effectively computes the mean of the neighbor features instead of the sum.
The symmetric normalization formula is:
Anorm=D^−1/2A^D^−1/2Here, A^ is the adjacency matrix with self-loops, and D^ is the diagonal degree matrix computed from A^.
Let's implement this:
# Compute the Degree Matrix (D_hat)
D_hat = np.diag(np.sum(A_hat, axis=1))
print("Degree Matrix (D_hat):\n", D_hat)
# Compute the inverse square root of the degree matrix
# We add a small epsilon to avoid division by zero for isolated nodes
D_hat_inv_sqrt = np.linalg.inv(D_hat)**0.5
# Normalize the adjacency matrix
A_norm = D_hat_inv_sqrt @ A_hat @ D_hat_inv_sqrt
print("\nNormalized Adjacency Matrix:\n", A_norm)
This normalized adjacency matrix A_norm can now be used to perform a more stable and effective aggregation.
After aggregation, the UPDATE step applies a learnable linear transformation followed by a non-linear activation function. This is identical to the operation inside a standard dense layer of a neural network.
The complete formula for our simple GNN layer is:
H(l+1)=σ(D^−1/2A^D^−1/2H(l)W(l))Where σ is a non-linear activation function like ReLU.
Let's encapsulate this logic into a reusable Python function. This function represents one complete message passing layer.
def gnn_layer(A, X, W):
"""
Performs one layer of Graph Neural Network message passing.
Args:
A (np.array): The adjacency matrix of the graph.
X (np.array): The node feature matrix.
W (np.array): The weight matrix for the layer.
Returns:
np.array: The updated node feature matrix.
"""
# 1. Add self-loops to the adjacency matrix
A_hat = A + np.eye(A.shape[0])
# 2. Compute the degree matrix
D_hat = np.diag(np.sum(A_hat, axis=1))
# 3. Compute the inverse square root of the degree matrix
D_hat_inv_sqrt = np.linalg.inv(D_hat)**0.5
# 4. Normalize the adjacency matrix
A_norm = D_hat_inv_sqrt @ A_hat @ D_hat_inv_sqrt
# 5. Perform the aggregation and update steps
# (A_norm @ X) is the aggregation of neighbor features
# (... @ W) is the linear transformation (update)
H_prime = A_norm @ X @ W
# 6. Apply a non-linear activation function (ReLU)
H_next = np.maximum(0, H_prime)
return H_next
# Run our graph data through the GNN layer
H_next = gnn_layer(A, X, W)
print("Original Node Features (X):\n", X)
print("\nUpdated Node Features after one GNN layer (H_next):\n", H_next)
The output H_next is a 4x3 matrix. Each row is a new feature vector for the corresponding node. This new vector was computed by taking a normalized average of the features of itself and its neighbors, and then transforming that result into a new, higher-dimensional space. This process has successfully updated each node's representation based on its local neighborhood.
By implementing this layer from scratch, you have built the fundamental component of many powerful GNN architectures. In the next chapter, we will see how this exact logic forms the basis for the Graph Convolutional Network (GCN) and explore other variations on the AGGREGATE and UPDATE functions.
Was this section helpful?
torch_geometric.nn.conv.GCNConv, PyTorch Geometric Developers, 2024 - Official documentation for the Graph Convolutional Network (GCN) layer implementation within the PyTorch Geometric library, illustrating how the core logic translates to a high-performance deep learning framework.© 2026 ApX Machine LearningEngineered with