A defining characteristic of a graph is that, unlike the pixels in an image or the words in a sentence, its nodes do not have a natural order. If you ask for the neighbors of a node, the set you receive, {u_1, u_2, u_3}, is identical to the set {u_3, u_1, u_2}. This lack of canonical ordering poses a significant challenge for standard neural networks like Multi-Layer Perceptrons (MLPs), which expect their input features to be in a fixed, ordered vector. If you were to feed the features of neighboring nodes to an MLP, the output would change depending on the arbitrary order you chose, leading to inconsistent and meaningless results.
To operate effectively on graphs, a neural network layer must respect this fundamental property. This is achieved through two related principles: permutation invariance and permutation equivariance.
Let's first consider the AGGREGATE function in our message passing formula. Its job is to take the set of feature vectors from a node's neighbors, { \mathbf{h}_u^{(l)} : u \in \mathcal{N}(v) \}, and combine them into a single message vector, .
For the network to produce a consistent output regardless of how we order the neighbors, the aggregation function must be permutation invariant. A function is permutation invariant if its output remains the same even when the order of its inputs is shuffled.
Here, is any permutation of the indices .
The aggregation functions we discussed in the previous section, such as sum, mean, and max, all have this property. For example, summing the feature vectors of neighbors will produce the same result no matter which neighbor you start with.
By using a permutation-invariant function for aggregation, we guarantee that the message generated from a node's neighborhood is based only on the set of neighbors, not on any artificial ordering we might impose.
While the aggregation step is invariant to the order of a single node's neighbors, the entire GNN layer exhibits a slightly different property when considering all nodes in the graph: permutation equivariance.
A function is permutation equivariant if permuting its inputs results in an identically permuted output. Let's say we have a function that takes a set of node features (a matrix where each row is a node's feature vector) and outputs a new set of features . If we permute the nodes in the input graph (which corresponds to permuting the rows of and the rows/columns of the adjacency matrix), the output features will be permuted in the exact same way.
Formally, if is any permutation matrix that reorders the nodes, then a function is equivariant if:
Here, is the adjacency matrix. The operation reorders the adjacency matrix according to the permutation, and reorders the feature matrix. The equation states that permuting the graph's inputs and then applying the GNN layer is the same as applying the GNN layer first and then permuting its outputs.
The message passing mechanism naturally achieves this. Each node's new representation is computed based on its own state and an aggregated message from its neighbors. Since this computation is performed for every node in parallel, if we re-index the nodes of the graph, the output embeddings will be re-indexed in the same way. The embedding for node A will still be the embedding for node A, even if we now call it node B.
Diagram illustrating invariance and equivariance. For invariance, different orderings of neighbor features fed into the
AGGREGATEfunction produce the same output message. For equivariance, permuting the order of all nodes in the graph before theGNN Layerresults in an identically permuted set of output node representations.
Permutation invariance and equivariance are not just theoretical properties. They are the reason GNNs can learn functions that depend on the graph's underlying structure, or topology, rather than on an arbitrary node ordering. A model that is not permutation equivariant might learn that "the first node in the list is important," which is a meaningless and non-generalizable pattern.
By adhering to these principles, GNNs can:
In essence, these properties ensure that the GNN is learning about the graph itself, not about the way we chose to write it down in memory. This is fundamental to building powerful and reliable models for graph-structured data.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with