While oversmoothing deals with the homogenization of node features in deep GNNs, oversquashing tackles a related but distinct challenge: the difficulty of propagating information between distant nodes when the graph structure creates information bottlenecks. Think of it less like features becoming blurry and more like a message getting stuck in transit due to limited pathways.
This problem arises because standard message passing GNNs aggregate information from neighbors at each layer. The node representation hv(k) at layer k for node v is computed based on its own features hv(k−1) and the features of its neighbors hu(k−1) for u∈N(v). To capture information from nodes L hops away, we need at least L message passing layers.
The core issue leading to oversquashing is that the number of nodes within an L-hop neighborhood can grow exponentially with L, especially in certain graph structures (like trees or expander graphs). However, the GNN must compress all the necessary information from this potentially vast neighborhood into a fixed-size node embedding vector hv(L). When the graph structure contains bottlenecks, meaning relatively few paths connect large regions of the graph, this compression becomes lossy. Information from nodes "on the other side" of the bottleneck gets "squashed" into the limited capacity of the embedding vectors passing through the bottleneck, effectively preventing distant nodes from significantly influencing each other's representations.
Consider two large, densely connected communities of nodes linked by only a few edges.
A graph with two dense communities (blue and green nodes) connected by a single bottleneck edge (red). Information passing between nodes in different communities (e.g., from B2 to A3) is restricted by this bottleneck.
If a GNN needs to propagate information from a node in the green community to a node in the blue community, all relevant signals must pass through the limited number of connections (in this diagram, just the single edge between A1 and B1). Each message passing step involves aggregation and transformation, and this bottleneck forces an aggressive compression of information flowing between the communities. Consequently, the influence of nodes diminishes rapidly across such bottlenecks.
Mathematically, oversquashing can be related to the properties of the Jacobian matrix of the GNN transformation. Let fGNN represent the function computed by the GNN layers mapping initial node features X to final embeddings H(L). Oversquashing implies that the partial derivative of a node v's final embedding hv(L) with respect to the initial feature xu of a distant node u becomes vanishingly small:
∂xu∂hv(L)≈0when d(u,v) is large and they are separated by a bottleneckHere, d(u,v) is the shortest path distance. This means changes in the features of node u have almost no impact on the final representation of node v, hindering the GNN's ability to learn long-range dependencies across structural bottlenecks. This phenomenon is particularly pronounced in graphs with high curvature or tree-like structures where paths between nodes don't diverge significantly.
It's important to differentiate oversquashing from oversmoothing:
While distinct, both problems limit the ability of standard message-passing GNNs to effectively utilize information from distant parts of the graph, particularly in deep models or graphs with challenging structures. Addressing oversquashing often requires architectural changes or techniques that go beyond simple message passing, such as graph rewiring, positional encodings, or employing architectures like Graph Transformers which can model longer-range interactions more directly, topics we explore later in the course. Techniques discussed for oversmoothing, like residual connections, might not directly solve the information bottleneck problem inherent in oversquashing.
© 2025 ApX Machine Learning