Understanding the capabilities and limitations of Graph Neural Networks requires evaluating their expressive power: their ability to distinguish between different graph structures and learn functions that depend on these structures. A fundamental question in graph theory is the graph isomorphism problem: determining if two graphs are structurally identical, even if their node labeling or drawing differs. While solving graph isomorphism efficiently is a long-standing open problem, certain heuristics provide valuable insights into graph similarity. One such heuristic, highly relevant to GNNs, is the Weisfeiler-Lehman (WL) test.
The most common variant discussed in the GNN literature is the 1-dimensional Weisfeiler-Lehman test (1-WL), also known as naive vertex refinement. It's an iterative algorithm designed to produce a canonical labeling or signature for a graph based on local neighborhood structures. If two graphs produce different signatures after any iteration, they are definitively non-isomorphic. However, if they produce the same signature, they might be isomorphic, but it's not guaranteed (the test is necessary but not sufficient for isomorphism).
Here's how the 1-WL test works:
Consider two simple graphs:
Initial state (k=0) for two graphs. All nodes have the same initial label (gray).
After one iteration of 1-WL:
a2
has one gray neighbor (b2
). Its Sa2(1) is {gray}. Node b2
has two gray neighbors (a2
, c2
). Its Sb2(1) is {gray, gray}. Node c2
has one gray neighbor (b2
). Its Sc2(1) is {gray}.
a2
and c2
hash (gray, {gray}) and get a new label, say 'orange'.b2
hashes (gray, {gray, gray}) and gets a new label, say 'blue'.State after one iteration (k=1). G1 has one label type (blue), G2 has two (orange, blue). The label distributions differ, so 1-WL distinguishes these graphs.
Since the label distributions differ after k=1, the 1-WL test concludes that G1 and G2 are non-isomorphic.
Now, let's revisit the message passing framework introduced earlier:
hv(k)=UPDATE(k)(hv(k−1),AGGREGATE(k)({hu(k−1):u∈N(v)}))Observe the similarity:
AGGREGATE
function collects information from neighbors, analogous to collecting neighbor labels lu(k−1) into the multiset Sv(k) in the WL test. Common aggregators like sum, mean, or max operate on the multiset of neighbor features {hu(k−1)}.UPDATE
function combines the aggregated neighborhood information with the node's own previous state hv(k−1), similar to how the WL test uses the pair (lv(k−1),Sv(k)) to compute the new label lv(k).This structural similarity is not coincidental. It has been formally shown that the expressive power of message-passing GNNs (MPNNs) is upper-bounded by the power of the 1-WL test. This means:
AGGREGATE
and UPDATE
functions (as long as they follow the message-passing paradigm) or the number of layers, the node representations learned by the GNN for structurally equivalent nodes (according to 1-WL) will converge to the same values in both graphs.The 1-WL test, and therefore standard MPNNs, cannot distinguish certain graph structures. A classic example is distinguishing different regular graphs (graphs where all nodes have the same degree). For instance, the 1-WL test cannot distinguish a 6-node cycle graph (degree 2 for all nodes) from two disconnected 3-node cycle graphs (also degree 2 for all nodes). An MPNN would compute identical node embeddings for all nodes in both scenarios after a sufficient number of layers.
Example graphs G3 (6-cycle) and G4 (two disjoint 3-cycles). Both are 2-regular. The 1-WL test (and standard MPNNs) cannot distinguish between them, assigning the same final label/embedding to all nodes in both graphs.
This limitation highlights that standard GNNs primarily capture local neighborhood structures effectively but struggle with certain global properties or distinguishing structures that appear locally similar according to the 1-WL refinement process.
Understanding this connection is significant for selecting or designing GNN architectures. If a task requires distinguishing graph structures beyond the capability of the 1-WL test, standard MPNN architectures like GCN or GraphSAGE might be insufficient. This motivates the development of more powerful GNNs, often inspired by higher-order WL tests (k-WL) or incorporating mechanisms like graph substructure counting or positional encodings, which we will examine in subsequent chapters. Knowing the theoretical ceiling of standard message passing helps appreciate why more advanced architectures are sometimes necessary.
© 2025 ApX Machine Learning