When you split data for traditional machine learning models, like a classifier for images, the process is straightforward. You randomly shuffle your dataset and divide it into training, validation, and test sets. This works because each data point, an image in this case, is assumed to be independent and identically distributed (i.i.d.). One image does not influence another.
In graph data, this assumption breaks down completely. Nodes are defined by their features and their connections to other nodes. The very essence of a GNN is to leverage these connections. If you randomly assign nodes to different sets, you risk creating a situation where the model is directly or indirectly trained on information from the test set, a problem known as data leakage. For example, a test node's features might be updated using information from a neighboring training node. This makes the evaluation unreliable and gives an overly optimistic measure of performance.
To handle this dependency, we use two distinct settings for splitting graph data: transductive and inductive.
In the transductive setting, we have access to the entire graph during training. This means all nodes and all edges are visible to the model from the start. However, we only have access to the labels for a subset of the nodes, which form our training set. The goal is to infer the labels for the remaining nodes in the same graph.
Think of it as filling in the blanks on a map you already have. You can see all the cities and roads, but only some cities are labeled with their population. Your task is to predict the population for the unlabeled cities.
During training, the GNN can pass messages across the entire graph structure. When we calculate the loss, however, we only consider the model's predictions for the nodes in the training set. The validation and test sets consist of other nodes within that same graph, and we use them to evaluate how well our model generalizes to its unseen neighbors.
How it works:
train_mask indicates which nodes to use for loss calculation and backpropagation. A val_mask and test_mask indicate which nodes to use for evaluation.A transductive split on a single graph. The model sees all nodes and edges. It trains on the labeled nodes (green) and is evaluated on its predictions for other nodes in the same graph (yellow for validation, red for test).
Many classic node classification benchmarks, such as the Cora citation network, are evaluated using this setting. The task is to classify academic papers (nodes) given the full citation network (edges).
In the inductive setting, the model is trained on one set of nodes or graphs and is then expected to make predictions on new, completely unseen nodes or graphs. This is much closer to the standard machine learning workflow and is necessary for most production applications where new data arrives continuously.
Imagine training a model to detect fraudulent transactions. You would train it on transaction graphs from past weeks. The goal is to deploy this model to detect fraud in next week's transaction graph, which will contain new customers and new interactions. The model cannot assume it has already seen the nodes it needs to make predictions on.
To achieve this, the training, validation, and test sets must be strictly separated. If you are splitting a single large graph, this means the validation and test nodes, along with all their connecting edges, must be completely removed from the graph seen by the model during training.
How it works:
An inductive split. The model is trained on a set of graphs (left) and evaluated on a new, unseen graph (right). The test graph's structure and nodes were not available during training.
Architectures like GraphSAGE, which use neighborhood sampling, are particularly well-suited for inductive learning because they are explicitly designed to generate embeddings for any node, regardless of whether it was seen during training.
| Aspect | Transductive Learning | Inductive Learning |
|---|---|---|
| Graph Access | The model sees the entire graph structure during training. | The model only sees the training graph(s). |
| Task Goal | Infer labels for unlabeled nodes within a fixed graph. | Generalize to make predictions on entirely new graphs or nodes. |
| Evaluation Data | Unlabeled nodes from the same graph used for training. | Completely new nodes or graphs unseen during training. |
| Common Use Case | Semi-supervised node classification on a single network. | Fraud detection, molecular property prediction, product recommendations. |
| Example Models | GCN (original formulation) | GraphSAGE, GAT (can be used in both settings) |
The choice between a transductive and an inductive setup depends entirely on your problem.
When implementing your training pipeline, be mindful of this distinction. Using node masks for splitting is a sign of a transductive setup. Creating truly separate graph objects for your train, validation, and test sets is necessary for an inductive setup. Misinterpreting the setting can lead to data leakage and a model that fails when deployed.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with