A single Graph Neural Network (GNN) layer processes node features by considering their immediate neighbors. While this layer is a powerful building block, its perspective is limited to a 1-hop neighborhood. To capture information from more distant parts of the graph, these GNN layers are stacked, forming a deep Graph Neural Network. This layering technique mirrors how standard deep neural networks create hierarchies of features.
The primary reason for stacking GNN layers is to expand each node's receptive field. A node's receptive field is the set of nodes in the graph that can influence its final representation.
With each additional GNN layer, the receptive field of every node expands by one hop. A GNN with K layers can therefore propagate information between nodes that are up to K hops apart. This allows the model to learn features based on larger sub-structures within the graph.
The diagram below illustrates this process. To compute the final representation for Node A after two layers, the model first aggregates information from its 1-hop neighbors (B and C) in the first layer. In the second layer, it aggregates the updated representations of B and C. Because the representations of B and C already contain information from their neighbors (D and E, respectively), Node A's final representation is influenced by its 2-hop neighbors.
The flow of information to Node A over two GNN layers. After the second layer, Node A's representation incorporates information from nodes D and E, which are two hops away.
Formally, a multi-layer GNN works by composing several message passing layers. The output embeddings of layer l, denoted as H(l), become the input embeddings for layer l+1. The process starts with the initial node features, H(0)=X.
For a node v, the computation for the first two layers proceeds as follows:
This process is repeated for K layers. The AGGREGATE and UPDATE functions for each layer typically have their own set of trainable parameters (e.g., weight matrices), allowing the network to learn different feature transformations at different depths. The final output of the K-th layer, hv(K), is the node embedding used for the downstream task.
While depth allows GNNs to access a wider graph context, there is a significant drawback to making them too deep: over-smoothing. Over-smoothing is a phenomenon where, after many message passing iterations, the representations of all nodes in a connected graph converge to a similar value.
Think of it like dropping a bit of colored dye into a pool of water. After one stir (one GNN layer), the color spreads to its immediate vicinity. After many stirs, the dye diffuses evenly throughout the entire pool, making it impossible to tell where the dye originated. Similarly, as node features are repeatedly averaged with their neighbors, they lose their initial, distinguishing information.
When node embeddings become indistinguishable, the model's performance on tasks like node classification degrades significantly, as it can no longer tell the nodes apart. Because of over-smoothing, most GNN architectures used in practice are relatively shallow, often consisting of only 2 to 4 layers. Mitigating this issue is an active area of GNN research, leading to more complex architectures with skip connections or other mechanisms to preserve initial node information.
Understanding this layered structure is fundamental. In the next chapter, we will examine specific, influential architectures like GCN and GraphSAGE that define concrete forms for the AGGREGATE and UPDATE functions within this multi-layer framework.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with