Common Graph Machine Learning Tasks

Machine learning with graph data structures presents unique opportunities for predictive tasks. It involves adapting familiar machine learning approaches to leverage the structure of graphs. The problems we aim to solve generally fall into three primary categories, distinguished by the level at which we want to make a prediction: the node level, the edge level, or the entire graph level.

Node-Level Tasks: Node Classification

The most common task in graph machine learning is node classification. The goal is to predict a property or label for each node in a graph. You can think of this as filling in missing information across the network. For this task, we are typically given a single graph where a fraction of the nodes are already labeled. The model's job is to use the node features and the graph's structure to predict the labels for the remaining, unlabeled nodes.

This is often a semi-supervised learning problem because we use the connections between labeled and unlabeled nodes to propagate information and make predictions.

A classic example is classifying documents in a citation network. Imagine a graph where nodes are research papers and an edge exists if one paper cites another. Given that a few papers are labeled with their subject area (e.g., "Physics," "Computer Science," "Biology"), the task is to classify the rest of the papers in the network. A model would learn that papers that cite each other or are cited by similar papers are likely to belong to the same subject area.

A node classification task in a simple network. The model must predict the label of the central gray node based on its features and its connections to the labeled blue and red nodes.

Other applications include:

Social Networks: Identifying user roles or flagging malicious accounts.
Protein-Protein Interaction Networks: Predicting the function of proteins based on their interactions with other well-understood proteins.

Edge-Level Tasks: Link Prediction

Another fundamental task is link prediction, which focuses on the relationships between nodes. The objective here is to predict whether an edge is missing and should exist between two nodes. It addresses the question: "Given two nodes, how likely are they to be connected?"

This problem is framed by treating existing edges as positive examples and a sample of non-existent edges as negative examples. The model then learns a function that scores pairs of nodes on their likelihood of being connected.

A familiar application of link prediction is recommendation systems.

E-commerce: In a graph of users and products, link prediction can suggest which products a user might like, effectively recommending items by predicting a "user-buys-product" edge.
Social Media: Friend recommendations on platforms like LinkedIn or Facebook are a direct application of link prediction, where the system predicts who you are likely to know.

Link prediction aims to determine if an edge should exist between two nodes, such as nodes B and D, which currently belong to different communities but share a common neighbor G.

This technique is also valuable in biology for predicting undiscovered protein-protein interactions or in transportation for identifying future high-traffic routes between locations.

Graph-Level Tasks: Graph Classification

Finally, some problems require us to make a prediction for an entire graph. In graph classification, the task is to assign a single label to a whole graph. This is analogous to image classification, but instead of classifying a grid of pixels, we are classifying a network of nodes and edges.

This task is common in domains where the dataset consists of many individual, smaller graphs rather than one large, single graph. A model must learn to extract structural and feature-based information from an entire graph, aggregate it into a fixed-size representation, and then pass that representation to a classifier.

A prominent application is in chemistry and drug discovery. Molecules can be represented as graphs, where atoms are nodes and chemical bonds are edges. A graph classification model can be trained to predict properties of the molecule, such as its toxicity, solubility, or whether it will be an effective drug for a particular disease.

In graph classification, the model learns to assign a label to an entire graph. For example, distinguishing between molecular structures to predict their properties.

Other applications include:

Computer Vision: Classifying scene graphs to understand the high-level content of an image.
Social Network Analysis: Classifying online communities as either benign or extremist based on their communication patterns.

Summary of Tasks

These three tasks provide a framework for applying machine learning to structured data. The following table summarizes their objectives and outputs.

Task	Level	Objective	Example Output
Node Classification	Node	Predict a label for each node	The category of a research paper
Link Prediction	Edge (Link)	Predict the existence of an edge	A "friend suggestion" binary flag (yes/no)
Graph Classification	Graph	Predict a label for the entire graph	The toxicity of a molecule

While other tasks like community detection and graph regression exist, these three form the foundation for a majority of GNN applications. Throughout this course, we will focus primarily on building models for node classification, as it provides an excellent basis for understanding the mechanics of Graph Neural Networks.

Was this section helpful?

References

Graph Neural Networks, William L. Hamilton, 2020 Synthesis Lectures on Artificial Intelligence and Machine Learning, Vol. 14, No. 3 (Morgan & Claypool Publishers) DOI: 10.2200/S01045ED1V01Y202009AIM046 - A concise introduction to graph neural networks, covering fundamental concepts, model architectures, and applications for various graph machine learning tasks.
CS224W: Machine Learning with Graphs, Jure Leskovec, 2023 (Stanford University) - Official course materials from a widely recognized university course dedicated to machine learning on graphs, providing lectures and assignments on various graph tasks.