Machine learning with graph data structures presents unique opportunities for predictive tasks. It involves adapting familiar machine learning approaches to leverage the structure of graphs. The problems we aim to solve generally fall into three primary categories, distinguished by the level at which we want to make a prediction: the node level, the edge level, or the entire graph level.
The most common task in graph machine learning is node classification. The goal is to predict a property or label for each node in a graph. You can think of this as filling in missing information across the network. For this task, we are typically given a single graph where a fraction of the nodes are already labeled. The model's job is to use the node features and the graph's structure to predict the labels for the remaining, unlabeled nodes.
This is often a semi-supervised learning problem because we use the connections between labeled and unlabeled nodes to propagate information and make predictions.
A classic example is classifying documents in a citation network. Imagine a graph where nodes are research papers and an edge exists if one paper cites another. Given that a few papers are labeled with their subject area (e.g., "Physics," "Computer Science," "Biology"), the task is to classify the rest of the papers in the network. A model would learn that papers that cite each other or are cited by similar papers are likely to belong to the same subject area.
A node classification task in a simple network. The model must predict the label of the central gray node based on its features and its connections to the labeled blue and red nodes.
Other applications include:
Another fundamental task is link prediction, which focuses on the relationships between nodes. The objective here is to predict whether an edge is missing and should exist between two nodes. It addresses the question: "Given two nodes, how likely are they to be connected?"
This problem is framed by treating existing edges as positive examples and a sample of non-existent edges as negative examples. The model then learns a function that scores pairs of nodes on their likelihood of being connected.
A familiar application of link prediction is recommendation systems.
Link prediction aims to determine if an edge should exist between two nodes, such as nodes B and D, which currently belong to different communities but share a common neighbor G.
This technique is also valuable in biology for predicting undiscovered protein-protein interactions or in transportation for identifying future high-traffic routes between locations.
Finally, some problems require us to make a prediction for an entire graph. In graph classification, the task is to assign a single label to a whole graph. This is analogous to image classification, but instead of classifying a grid of pixels, we are classifying a network of nodes and edges.
This task is common in domains where the dataset consists of many individual, smaller graphs rather than one large, single graph. A model must learn to extract structural and feature-based information from an entire graph, aggregate it into a fixed-size representation, and then pass that representation to a classifier.
A prominent application is in chemistry and drug discovery. Molecules can be represented as graphs, where atoms are nodes and chemical bonds are edges. A graph classification model can be trained to predict properties of the molecule, such as its toxicity, solubility, or whether it will be an effective drug for a particular disease.
In graph classification, the model learns to assign a label to an entire graph. For example, distinguishing between molecular structures to predict their properties.
Other applications include:
These three tasks provide a framework for applying machine learning to structured data. The following table summarizes their objectives and outputs.
| Task | Level | Objective | Example Output |
|---|---|---|---|
| Node Classification | Node | Predict a label for each node | The category of a research paper |
| Link Prediction | Edge (Link) | Predict the existence of an edge | A "friend suggestion" binary flag (yes/no) |
| Graph Classification | Graph | Predict a label for the entire graph | The toxicity of a molecule |
While other tasks like community detection and graph regression exist, these three form the foundation for a majority of GNN applications. Throughout this course, we will focus primarily on building models for node classification, as it provides an excellent basis for understanding the mechanics of Graph Neural Networks.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with