Graph Convolutional Networks (GCNs) have emerged as an effective and efficient approach for learning representations from graph data. Despite their utility, GCNs exhibit two main constraints. First, these networks are inherently transductive. A GCN model learns a specific embedding for every node present in the training graph and cannot easily generate embeddings for new nodes added after training. Second, their computation can be expensive on large graphs, particularly for nodes with many neighbors, because the entire neighborhood is processed at each layer.
GraphSAGE, which stands for Graph SAmpling and Aggregating, was developed to address these challenges directly. It introduces a framework that not only scales to massive graphs but also enables inductive learning, allowing the model to generalize to entirely unseen nodes.
The innovation of GraphSAGE is twofold:
By learning how to aggregate information from a sampled set of local neighbors, the model learns a function that generates embeddings, rather than just memorizing embeddings for existing nodes.
At each layer of the model, and for each node, GraphSAGE performs a two-step process:
The diagram below illustrates the sampling process for a two-layer GraphSAGE model. To compute the final embedding for the central node A, the model first samples a few of its neighbors (B, C, D). Then, for each of those neighbors, it samples from their respective neighborhoods (e.g., E and F for B). The aggregation process then works from the outside in.
For node A, the computation depends on its sampled neighbors {B, C, D}, which in turn depend on their sampled neighbors. This creates a fixed-size computation graph regardless of node A's actual degree.
This sampling strategy ensures that the computational cost for each node is constant, regardless of its degree. If a node has thousands of neighbors, we still only process a small, fixed-size sample, making the algorithm highly scalable.
Unlike GCNs, which use a fixed mean aggregation, GraphSAGE uses different functions to aggregate information from the sampled neighbors. The choice of aggregator can significantly impact model performance. Let represent the feature vector of a neighbor node at the previous layer , and let be the set of sampled neighbors for the target node .
The authors of GraphSAGE proposed three primary aggregator functions:
This is the simplest option and is very similar to the GCN aggregator. It takes the element-wise mean of the feature vectors of all sampled neighbors.
This function is straightforward and computationally efficient.
For this aggregator, the neighbors are treated as a sequence. An LSTM (Long Short-Term Memory) network, a type of recurrent neural network, processes this sequence. Because LSTMs are sensitive to input order and graphs have no natural ordering of neighbors, a random permutation of the neighbors is used during training. This aggregator is more expressive but also more complex to implement and train.
The pooling aggregator is often the most effective. Each neighbor's feature vector is first fed through its own independent multi-layer perceptron (MLP). Following this transformation, a symmetric, element-wise pooling operation (like max or mean pooling) is applied to aggregate the information.
Here, is a learnable weight matrix, is a non-linear activation function, and max refers to element-wise max pooling. The use of a trainable MLP gives this aggregator significant expressive power.
After aggregating the neighbor representations , GraphSAGE combines this vector with the target node's own representation from the previous layer, . A main difference from GCN is that these two vectors are concatenated before being passed through a linear layer and a non-linear activation function.
The concatenation operator explicitly preserves the node's previous representation, similar to a "skip connection" in residual networks. This has been shown to improve performance. The matrix contains the learnable weights for layer .
By learning the weights of the aggregator (in the case of the pooling aggregator) and the update function, GraphSAGE learns a general function for generating node embeddings based on local neighborhood structures. This is what gives it its inductive power, a topic we will explore in the next section.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with