At its core, a graph is a structure used to represent relationships between objects. Think of a social network, where people are the objects and their friendships are the relationships. Or consider a molecule, where atoms are the objects and chemical bonds are the relationships. This structure of objects and connections is fundamental to many complex systems, from transportation networks to protein-protein interaction maps.
Formally, a graph is defined as a pair , where is a set of vertices (more commonly called nodes), and is a set of edges that represent connections between pairs of nodes.
Let's break down the primary components you'll work with when handling graph data.
Nodes are the fundamental entities within a graph. In a social network graph, a node represents a person. In a web graph, a node could be a single web page.
Edges are the connections between nodes. An edge between two nodes indicates that some form of relationship exists between them. For a social network, an edge might mean two people are friends. For a web graph, an edge could represent a hyperlink from one page to another.
A simple graph with five nodes (A, B, C, D, E) and four edges connecting them.
For machine learning applications, a graph's structure alone is often not enough. We enrich this structure with data, known as features or attributes.
Not all graphs are the same. Their properties can vary, and understanding these distinctions is important for modeling them correctly.
In an undirected graph, edges are bidirectional. If node A is connected to node B, then B is also connected to A. A Facebook friendship is a good example; the relationship is mutual.
In a directed graph, edges have a direction. A connection from node A to node B does not imply a connection from B to A. Think of Twitter, where you can "follow" someone without them following you back. These directed edges are often drawn with arrows.
An undirected relationship is mutual, while a directed relationship has a specific origin and destination.
In an unweighted graph, all edges are treated equally. Their existence simply indicates a connection. In contrast, a weighted graph assigns a numerical weight to each edge to represent the strength or cost of the connection. For example, in a map of airline routes, edge weights could represent the flight distance or ticket price between two cities.
A homogeneous graph is one where all nodes and edges are of the same type. The Cora citation network, a common benchmark dataset, is homogeneous: all nodes are research papers, and all edges are citations.
A heterogeneous graph contains nodes or edges of different types. Consider an e-commerce platform: you might have nodes for Users, Products, and Brands. The edges could also be of different types, such as user-BUYS-product, user-RATES-product, and brand-PRODUCES-product. These graphs represent more complex systems and require specialized GNN architectures.
Understanding these foundational properties of graphs is the first step toward applying machine learning. This structure of nodes, edges, and features is how we represent interconnected data, preparing it for the specialized models we will build in later chapters.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with