While the mathematical origins of Graph Convolutional Networks are rooted in spectral graph theory, which involves complex operations like graph Fourier transforms, a more direct and accessible way to understand them is from a spatial perspective. This view frames the graph convolution as a message passing operation performed directly on the graph's structure, aligning perfectly with the neighborhood aggregation framework.
To build intuition, consider how a Convolutional Neural Network (CNN) operates on an image. A CNN uses a small kernel, or filter, that slides across the grid of pixels. At each position, the kernel computes a weighted average of the pixel values in its immediate, structured neighborhood. This process effectively aggregates local information to create a new feature representation for each pixel.
A GCN performs a similar function, but on an irregular, unstructured graph. Instead of a fixed-size grid of neighbors like in an image, each node's neighborhood is defined by its incoming edges. The "convolution" operation for a node consists of aggregating the feature vectors from its neighbors.
In its simplest form, a GCN layer updates a node's feature vector by taking the average of its neighbors' feature vectors. This operation directly mirrors the goal of a CNN kernel: to create a new representation for a point based on the features of its local environment.
Let's look at the GCN layer's operation not as a single matrix multiplication, but as a process that happens for each node. The update for a single node i can be understood as an "aggregate and update" process:
The update rule for the hidden representation hi(l+1) of node i at layer l+1 is given by:
hi(l+1)=σj∈N(i)∪{i}∑deg(i)deg(j)1hj(l)W(l)Here:
This formula directly translates the spatial idea into math. For each node i, we loop through its neighbors j, take their current feature vectors hj(l), transform them with a weight matrix W(l), and then add them up using a weighted sum where the weights are determined by the node degrees.
A node-centric view of a graph convolution. Node A updates its representation by aggregating transformed feature vectors (h) from its neighbors (B, C, D) and itself.
The normalization term, deg(i)deg(j)1, is a distinguishing feature of the GCN model. While simply averaging by the degree of the central node (deg(i)1) might seem sufficient, this symmetric normalization has better empirical performance. It accounts for the degrees of both the source and destination nodes in the message passing step, preventing the scale of feature vectors from being overly influenced by high-degree nodes.
By viewing the graph convolution spatially, we can see that the GCN layer is an efficient and specialized implementation of the message passing scheme. It defines a specific choice for the AGGREGATE function (a degree-normalized sum) and the UPDATE function (applying a non-linearity). This perspective makes it easier to compare GCNs with other architectures like GraphSAGE and GAT, which simply offer different choices for these functions.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with