A Spatial Interpretation of Graph Convolutions

While the mathematical origins of Graph Convolutional Networks are rooted in spectral graph theory, which involves complex operations like graph Fourier transforms, a more direct and accessible way to understand them is from a spatial perspective. This view frames the graph convolution as a message passing operation performed directly on the graph's structure, aligning perfectly with the neighborhood aggregation framework.

From Image Convolutions to Graph Convolutions

To build intuition, let's see how a Convolutional Neural Network (CNN) operates on an image. A CNN uses a small kernel, or filter, that slides across the grid of pixels. At each position, the kernel computes a weighted average of the pixel values in its immediate, structured neighborhood. This process effectively aggregates local information to create a new feature representation for each pixel.

A GCN performs a similar function, but on an irregular, unstructured graph. Instead of a fixed-size grid of neighbors like in an image, each node's neighborhood is defined by its incoming edges. The "convolution" operation for a node consists of aggregating the feature vectors from its neighbors.

In its simplest form, a GCN layer updates a node's feature vector by taking the average of its neighbors' feature vectors. This operation directly mirrors the goal of a CNN kernel: to create a new representation for a point based on the features of its local environment.

A Node-Centric View of the GCN Formula

Let's look at the GCN layer's operation not as a single matrix multiplication, but as a process that happens for each node. The update for a single node $i$ can be understood as an "aggregate and update" process:

Aggregate Neighbor Information: The model gathers the feature vectors, $h_j$ , from all neighboring nodes $j$ (including the node $i$ itself via a self-loop).
Normalize the Aggregation: The aggregated sum is then normalized. This is a significant step. Without it, nodes with many neighbors (high-degree nodes) would have feature vectors with much larger magnitudes after aggregation, which can lead to unstable training. The GCN's specific normalization scheme uses the degrees of both the central node $i$ and the neighboring node $j$ .

The update rule for the hidden representation $h_i^{(l+1)}$ of node $i$ at layer $l+1$ is given by:

h_i^{(l+1)} = \sigma \left( \sum_{j \in \mathcal{N}(i) \cup \{i\}} \frac{1}{\sqrt{\deg(i)\deg(j)}} h_j^{(l)} W^{(l)} \right)

Here:

$\mathcal{N}(i)$ is the set of neighbors of node $i$ .
$\deg(i)$ is the degree of node $i$ .
$W^{(l)}$ is a trainable weight matrix for layer $l$ , shared across all nodes.
$\sigma$ is a non-linear activation function, such as ReLU.

This formula directly translates the spatial idea into math. For each node $i$ , we loop through its neighbors $j$ , take their current feature vectors $h_j^{(l)}$ , transform them with a weight matrix $W^{(l)}$ , and then add them up using a weighted sum where the weights are determined by the node degrees.

A node-centric view of a graph convolution. Node A updates its representation by aggregating transformed feature vectors (h) from its neighbors (B, C, D) and itself.

The Importance of Normalization

The normalization term, $\frac{1}{\sqrt{\deg(i)\deg(j)}}$ , is a distinguishing feature of the GCN model. While simply averaging by the degree of the central node ( $\frac{1}{\deg(i)}$ ) might seem sufficient, this symmetric normalization has better empirical performance. It accounts for the degrees of both the source and destination nodes in the message passing step, preventing the scale of feature vectors from being overly influenced by high-degree nodes.

By viewing the graph convolution spatially, we can see that the GCN layer is an efficient and specialized implementation of the message passing scheme. It defines a specific choice for the AGGREGATE function (a degree-normalized sum) and the UPDATE function (applying a non-linearity). This perspective makes it easier to compare GCNs with other architectures like GraphSAGE and GAT, which simply offer different choices for these functions.

Was this section helpful?

References

Semi-Supervised Classification with Graph Convolutional Networks, Thomas N. Kipf and Max Welling, 2017 International Conference on Learning Representations (ICLR) DOI: 10.48550/arXiv.1609.02907 - The foundational paper introducing Graph Convolutional Networks (GCNs) and their specific update rule with symmetric normalization.
Neural Message Passing for Quantum Chemistry, Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl, 2017 Proceedings of the 34th International Conference on Machine Learning, Vol. 70 (PMLR) DOI: 10.48550/arXiv.1704.01212 - Formalizes the Message Passing Neural Network (MPNN) framework, illustrating GCNs as a specific instance of message passing.
Inductive Representation Learning on Large Graphs, William L. Hamilton, Rex Ying, and Jure Leskovec, 2017 Advances in Neural Information Processing Systems (NeurIPS) DOI: 10.48550/arXiv.1706.02216 - Introduces the GraphSAGE framework, which extends neighborhood aggregation beyond GCNs and enables inductive learning.