While oversmoothing describes the homogenization of node features in deep Graph Neural Networks (GNNs), oversquashing presents a related but distinct challenge: the difficulty of propagating information between distant nodes when the graph structure creates information bottlenecks. This phenomenon can be compared to information getting stuck in transit due to limited pathways, rather than features merely becoming blurry.This problem arises because standard message passing GNNs aggregate information from neighbors at each layer. The node representation $h_v^{(k)}$ at layer $k$ for node $v$ is computed based on its own features $h_v^{(k-1)}$ and the features of its neighbors $h_u^{(k-1)}$ for $u \in \mathcal{N}(v)$. To capture information from nodes $L$ hops away, we need at least $L$ message passing layers.The core issue leading to oversquashing is that the number of nodes within an $L$-hop neighborhood can grow exponentially with $L$, especially in certain graph structures (like trees or expander graphs). However, the GNN must compress all the necessary information from this potentially extensive neighborhood into a fixed-size node embedding vector $h_v^{(L)}$. When the graph structure contains bottlenecks, meaning relatively few paths connect large regions of the graph, this compression becomes lossy. Information from nodes "on the other side" of the bottleneck gets "squashed" into the limited capacity of the embedding vectors passing through the bottleneck, effectively preventing distant nodes from significantly influencing each other's representations.Understanding the Bottleneck EffectConsider two large, densely connected communities of nodes linked by only a few edges.graph G { node [shape=circle, style=filled, fontsize=10]; A1 [fillcolor="#74c0fc"]; A2 [fillcolor="#74c0fc"]; A3 [fillcolor="#74c0fc"]; A4 [fillcolor="#74c0fc"]; B1 [fillcolor="#69db7c"]; B2 [fillcolor="#69db7c"]; B3 [fillcolor="#69db7c"]; B4 [fillcolor="#69db7c"]; A1 -- A2; A2 -- A3; A3 -- A4; A4 -- A1; A1 -- A3; A2 -- A4; B1 -- B2; B2 -- B3; B3 -- B4; B4 -- B1; B1 -- B3; B2 -- B4; A1 -- B1 [penwidth=2.5, color="#f03e3e"]; }A graph with two dense communities (blue and green nodes) connected by a single bottleneck edge (red). Information passing between nodes in different communities (e.g., from B2 to A3) is restricted by this bottleneck.If a GNN needs to propagate information from a node in the green community to a node in the blue community, all relevant signals must pass through the limited number of connections (in this diagram, just the single edge between A1 and B1). Each message passing step involves aggregation and transformation, and this bottleneck forces an aggressive compression of information flowing between the communities. Consequently, the influence of nodes diminishes rapidly across such bottlenecks.Mathematically, oversquashing can be related to the properties of the Jacobian matrix of the GNN transformation. Let $f_{\text{GNN}}$ represent the function computed by the GNN layers mapping initial node features $X$ to final embeddings $H^{(L)}$. Oversquashing implies that the partial derivative of a node $v$'s final embedding $h_v^{(L)}$ with respect to the initial feature $x_u$ of a distant node $u$ becomes vanishingly small:$$ \left| \frac{\partial h_v^{(L)}}{\partial x_u} \right| \approx 0 \quad \text{when } d(u, v) \text{ is large and they are separated by a bottleneck} $$Here, $d(u, v)$ is the shortest path distance. This means changes in the features of node $u$ have almost no impact on the final representation of node $v$, hindering the GNN's ability to learn long-range dependencies across structural bottlenecks. This phenomenon is particularly pronounced in graphs with high curvature or tree-like structures where paths between nodes don't diverge significantly.Consequences of OversquashingPoor Long-Range Dependency Learning: GNNs struggle to capture relationships between nodes separated by bottlenecks, limiting performance on tasks requiring global graph understanding.Limited Receptive Field: Although theoretically capable of reaching distant nodes with enough layers, the effective receptive field is constrained by these structural bottlenecks, preventing meaningful information flow.Task Performance Degradation: Tasks like predicting molecular properties (where distant atoms influence function) or certain types of node classification in social networks can be severely affected.Distinguishing Oversquashing from OversmoothingIt's important to differentiate oversquashing from oversmoothing:Oversmoothing: Node representations become too similar (indistinguishable) as the number of layers increases, losing node-specific information. This is primarily a function of network depth and repeated averaging/aggregation.Oversquashing: Node representations fail to capture influence from distant nodes due to structural bottlenecks in the graph, limiting information flow regardless of feature similarity. This is primarily a function of graph structure and fixed embedding sizes.While distinct, both problems limit the ability of standard message-passing GNNs to effectively utilize information from distant parts of the graph, particularly in deep models or graphs with challenging structures. Addressing oversquashing often requires architectural changes or techniques that extend simple message passing, such as graph rewiring, positional encodings, or employing architectures like Graph Transformers which can model longer-range interactions more directly, topics we discuss later in the course. Techniques discussed for oversmoothing, like residual connections, might not directly solve the information bottleneck problem inherent in oversquashing.