What is Graph Data?

A graph is a structure used to represent relationships between objects. Think of a social network, where people are the objects and their friendships are the relationships. Or take a molecule, where atoms are the objects and chemical bonds are the relationships. This structure of objects and connections is fundamental to many complex systems, from transportation networks to protein-protein interaction maps.

Formally, a graph $G$ is defined as a pair $G = (V, E)$ , where $V$ is a set of vertices (more commonly called nodes), and $E$ is a set of edges that represent connections between pairs of nodes.

The Components of a Graph

Let's break down the primary components you'll work with when handling graph data.

Nodes and Edges

Nodes are the fundamental entities within a graph. In a social network graph, a node represents a person. In a web graph, a node could be a single web page.

Edges are the connections between nodes. An edge between two nodes indicates that some form of relationship exists between them. For a social network, an edge might mean two people are friends. For a web graph, an edge could represent a hyperlink from one page to another.

A simple graph with five nodes (A, B, C, D, E) and four edges connecting them.

Features and Attributes

For machine learning applications, a graph's structure alone is often not enough. We enrich this structure with data, known as features or attributes.

Node Features: These are attributes attached to each node. If a node represents a user in a social network, its features might include age, city of residence, or a profile summary. In a scientific paper citation network, node features could be word embeddings of the paper's abstract. This data is typically represented as a vector for each node.
Edge Features: These are attributes that describe the connection itself. For a road network graph, an edge feature could be the distance or the average travel time between two intersections (nodes). For a "friendship" edge, it might be the date the friendship was initiated.
Global Features: These are attributes that describe the graph as a whole. For a graph representing a molecule, a global feature could be its overall solubility in water.

Common Types of Graphs

Not all graphs are the same. Their properties can vary, and understanding these distinctions is important for modeling them correctly.

Directed and Undirected Graphs

In an undirected graph, edges are bidirectional. If node A is connected to node B, then B is also connected to A. A Facebook friendship is a good example; the relationship is mutual.

In a directed graph, edges have a direction. A connection from node A to node B does not imply a connection from B to A. Think of Twitter, where you can "follow" someone without them following you back. These directed edges are often drawn with arrows.

An undirected relationship is mutual, while a directed relationship has a specific origin and destination.

Weighted and Unweighted Graphs

In an unweighted graph, all edges are treated equally. Their existence simply indicates a connection. In contrast, a weighted graph assigns a numerical weight to each edge to represent the strength or cost of the connection. For example, in a map of airline routes, edge weights could represent the flight distance or ticket price between two cities.

Homogeneous and Heterogeneous Graphs

A homogeneous graph is one where all nodes and edges are of the same type. The Cora citation network, a common benchmark dataset, is homogeneous: all nodes are research papers, and all edges are citations.

A heterogeneous graph contains nodes or edges of different types. For example, on an e-commerce platform: you might have nodes for Users, Products, and Brands. The edges could also be of different types, such as user-BUYS-product, user-RATES-product, and brand-PRODUCES-product. These graphs represent more complex systems and require specialized GNN architectures.

Understanding these foundational properties of graphs is the first step toward applying machine learning. This structure of nodes, edges, and features is how we represent interconnected data, preparing it for the specialized models we will build in later chapters.

Was this section helpful?

References

Graph Theory, Reinhard Diestel, 2017 Vol. 173 (Springer-Verlag) DOI: 10.1007/978-3-662-53622-3 - A classic and authoritative textbook that provides a formal treatment of graph theory, including definitions of graphs, nodes, edges, and various graph properties.
Graph Representation Learning, William L. Hamilton, 2020 Synthesis Lectures on Artificial Intelligence and Machine Learning, Vol. Vol. 14, No. 3 (Morgan & Claypool Publishers) DOI: 10.2200/S01045ED1V01Y202009AIM046 - An essential resource for understanding how graph data is structured and represented for machine learning tasks, covering node, edge, and global features, and different graph types.
Graph Neural Networks: A Review of Methods and Applications, Jie Zhou, Ganqu Cui, Zhengyu Chen, Ming Ding, Shuai Sun, Xuan Wang, and Lifang He, 2021 AI Open, Vol. 1 (Elsevier) DOI: 10.1016/j.aiopen.2021.05.001 - A widely cited survey article that introduces Graph Neural Networks, beginning with foundational graph definitions and their relevance to machine learning applications.
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges, Michael M. Bronstein, Joan Bruna, Taco Cohen, and Petar Veličković, 2021 (MIT Press) - A comprehensive textbook on geometric deep learning, including detailed descriptions of graph data, its properties, and its role as a fundamental structure for deep learning on non-Euclidean domains.