Graph Neural Networks (GNNs) generate an updated representation for a node by integrating new information, often derived from messages gathered from its neighbors, with the node's current state. This integration is a two-part process: combining the gathered message with the node's current state and then transforming it. This update ensures that the node's new feature vector contains both its own prior information and the collective wisdom of its local neighborhood.
Let's denote the aggregated message vector for a node as . This vector is the output of the AGGREGATE function we discussed previously. The update function's job is to combine this message with the node's own feature vector from the previous layer, .
A common and effective strategy for combining these two vectors is to first concatenate them and then pass the result through a standard neural network layer. This layer consists of a linear transformation (a weight matrix) followed by a non-linear activation function.
Mathematically, this update operation can be written as:
Let's break this down:
The diagram below illustrates this flow. The node's previous representation and the aggregated message are combined, processed by a linear layer, and passed through an activation function to produce the node's new representation.
The update step for a single node. The process combines the node's existing feature vector with the aggregated message, then transforms this combined vector to produce the new representation for the next layer.
An alternative approach, seen in architectures like Graph Convolutional Networks (GCNs), applies separate linear transformations to the self-vector and the aggregated message before combining them, typically with a sum. We will examine that specific formulation in the next chapter. For a general message passing network, the concatenation method is a powerful and intuitive starting point.
The activation function, denoted by , is an essential component of any neural network, and GNNs are no exception. Its purpose is to introduce non-linearity into the model.
If we were to omit the activation function, each GNN layer would simply be a sequence of linear operations (aggregation, which is often linear, followed by a linear transformation). Stacking multiple such layers would be pointless, as a sequence of linear transformations can always be mathematically reduced to a single, more complex linear transformation. The model's capacity would be severely limited, and it would fail to learn complex patterns in the data.
The most widely used activation function in modern deep learning, including GNNs, is the Rectified Linear Unit (ReLU). It is defined as:
ReLU is computationally efficient and works well in practice. It simply replaces any negative values in the feature vector with zero, allowing the network to model non-linear relationships by selectively "activating" certain features. While other functions like LeakyReLU, Sigmoid, or Tanh exist, ReLU is a strong default choice for hidden layers in a GNN.
By combining the aggregated neighborhood message with a node's own state and then applying a non-linear transformation, a GNN layer computes a richer, higher-level representation for each node. This new representation captures both the node's individual attributes and its structural role within its local environment.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with