GraphSAGE: Sampling and Aggregating Neighborhoods

Graph Convolutional Networks (GCNs) have emerged as an effective and efficient approach for learning representations from graph data. Despite their utility, GCNs exhibit two main constraints. First, these networks are inherently transductive. A GCN model learns a specific embedding for every node present in the training graph and cannot easily generate embeddings for new nodes added after training. Second, their computation can be expensive on large graphs, particularly for nodes with many neighbors, because the entire neighborhood is processed at each layer.

GraphSAGE, which stands for Graph SAmpling and Aggregating, was developed to address these challenges directly. It introduces a framework that not only scales to massive graphs but also enables inductive learning, allowing the model to generalize to entirely unseen nodes.

The innovation of GraphSAGE is twofold:

Neighborhood Sampling: Instead of using the full neighborhood of a node for aggregation, GraphSAGE first samples a fixed number of neighbors at each layer.
Generalized Aggregators: It replaces the specific, non-trainable aggregation of GCN with a variety of learnable aggregation functions.

By learning how to aggregate information from a sampled set of local neighbors, the model learns a function that generates embeddings, rather than just memorizing embeddings for existing nodes.

The GraphSAGE Process

At each layer of the model, and for each node, GraphSAGE performs a two-step process:

Sample: From the local neighborhood of a target node, sample a fixed number of neighbors. This is repeated for as many layers as the GNN has. For a two-layer model, you would first sample neighbors of your target node, and then for each of those sampled neighbors, you would sample from their neighbors. This creates a fixed-size computation tree for every node.
Aggregate: A special aggregator function is used to gather information from the sampled neighbors. The aggregated information is then combined with the target node's own representation from the previous layer and passed through a neural network layer to produce its new representation.

The diagram below illustrates the sampling process for a two-layer GraphSAGE model. To compute the final embedding for the central node A, the model first samples a few of its neighbors (B, C, D). Then, for each of those neighbors, it samples from their respective neighborhoods (e.g., E and F for B). The aggregation process then works from the outside in.

For node A, the computation depends on its sampled neighbors {B, C, D}, which in turn depend on their sampled neighbors. This creates a fixed-size computation graph regardless of node A's actual degree.

This sampling strategy ensures that the computational cost for each node is constant, regardless of its degree. If a node has thousands of neighbors, we still only process a small, fixed-size sample, making the algorithm highly scalable.

Generalized Aggregation Functions

Unlike GCNs, which use a fixed mean aggregation, GraphSAGE uses different functions to aggregate information from the sampled neighbors. The choice of aggregator can significantly impact model performance. Let $h_u^{k-1}$ represent the feature vector of a neighbor node $u$ at the previous layer $k-1$ , and let $\mathcal{S}_{\mathcal{N}(v)}$ be the set of sampled neighbors for the target node $v$ .

The authors of GraphSAGE proposed three primary aggregator functions:

Mean Aggregator

This is the simplest option and is very similar to the GCN aggregator. It takes the element-wise mean of the feature vectors of all sampled neighbors.

h_{\mathcal{S}_{\mathcal{N}(v)}}^k = \frac{1}{|\mathcal{S}_{\mathcal{N}(v)}|} \sum_{u \in \mathcal{S}_{\mathcal{N}(v)}} h_u^{k-1}

This function is straightforward and computationally efficient.

LSTM Aggregator

For this aggregator, the neighbors are treated as a sequence. An LSTM (Long Short-Term Memory) network, a type of recurrent neural network, processes this sequence. Because LSTMs are sensitive to input order and graphs have no natural ordering of neighbors, a random permutation of the neighbors is used during training. This aggregator is more expressive but also more complex to implement and train.

Pooling Aggregator

The pooling aggregator is often the most effective. Each neighbor's feature vector is first fed through its own independent multi-layer perceptron (MLP). Following this transformation, a symmetric, element-wise pooling operation (like max or mean pooling) is applied to aggregate the information.

h_{\mathcal{S}_{\mathcal{N}(v)}}^k = \text{max}(\{\sigma(\mathbf{W}_{\text{pool}} h_u^{k-1} + b), \forall u \in \mathcal{S}_{\mathcal{N}(v)}\})

Here, $\mathbf{W}_{\text{pool}}$ is a learnable weight matrix, $\sigma$ is a non-linear activation function, and max refers to element-wise max pooling. The use of a trainable MLP gives this aggregator significant expressive power.

The Update Step

After aggregating the neighbor representations $h_{\mathcal{S}_{\mathcal{N}(v)}}^k$ , GraphSAGE combines this vector with the target node's own representation from the previous layer, $h_v^{k-1}$ . A main difference from GCN is that these two vectors are concatenated before being passed through a linear layer and a non-linear activation function.

h_v^k = \sigma \left( \mathbf{W}^k \cdot \text{CONCAT}(h_v^{k-1}, h_{\mathcal{S}_{\mathcal{N}(v)}}^k) \right)

The concatenation operator explicitly preserves the node's previous representation, similar to a "skip connection" in residual networks. This has been shown to improve performance. The matrix $\mathbf{W}^k$ contains the learnable weights for layer $k$ .

By learning the weights of the aggregator (in the case of the pooling aggregator) and the update function, GraphSAGE learns a general function for generating node embeddings based on local neighborhood structures. This is what gives it its inductive power, a topic we will explore in the next section.

Was this section helpful?

References

Inductive Representation Learning on Large Graphs, William L. Hamilton, Rex Ying, Jure Leskovec, 2017 Advances in Neural Information Processing Systems (NeurIPS) 30 DOI: 10.48550/arXiv.1706.02216 - The seminal paper introducing GraphSAGE, detailing its inductive learning capabilities, neighborhood sampling, and generalized aggregation functions.
Graph Representation Learning, William L. Hamilton, 2020 (Morgan & Claypool Publishers) DOI: 10.2200/S01045ED1V01Y202009AIM046 - A comprehensive book by one of GraphSAGE's creators, offering in-depth coverage of graph representation learning, including GNNs, inductive learning, and scalability.
Semi-Supervised Classification with Graph Convolutional Networks, Thomas N. Kipf, Max Welling, 2017 International Conference on Learning Representations (ICLR) DOI: 10.48550/arXiv.1609.02907 - The foundational paper introducing Graph Convolutional Networks, providing context for the transductive and scalability challenges GraphSAGE aims to overcome.
Inductive Learning: GraphSAGE (Stanford CS224W Lecture 7), Jure Leskovec, 2023 - This lecture material from a prominent university course provides an accessible explanation and visualization of the GraphSAGE architecture and its inductive capabilities.