Representing Graphs: Adjacency and Feature Matrices

To apply machine learning models to graphs, we first need to convert their abstract structure of nodes and edges into a numerical format that algorithms can process. Just as we represent images as grids of pixel values or text as sequences of numerical vectors, we need a standard way to encode graphs. This is accomplished primarily through two matrices: the adjacency matrix, which captures the graph's topology, and the feature matrix, which holds the attributes of each node.

Representing Graph Structure: The Adjacency Matrix

The most direct way to represent the connections within a graph is with an adjacency matrix, typically denoted as $A$ . For a graph with $N$ nodes, the adjacency matrix is a square matrix of size $N \times N$ .

The rule for populating this matrix is straightforward. For an unweighted graph, the element $A_{ij}$ at the $i$ -th row and $j$ -th column is:

A_{ij} = \begin{cases} 1 & \text{if there is an edge between node } i \text{ and node } j \\ 0 & \text{otherwise} \end{cases}

By convention, a node is not connected to itself, so the diagonal elements $A_{ii}$ are usually set to 0.

For example, here's a simple social network graph below with four individuals.

An undirected graph with four nodes (0-3) representing individuals and edges representing friendships.

The corresponding adjacency matrix $A$ for this 4-node graph would be:

A = \begin{pmatrix} 0 & 1 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 0 & 1 \\ 0 & 1 & 1 & 0 \end{pmatrix}

Notice a few properties of this matrix:

Symmetry: Because the graph is undirected (a friendship between Ann and Ben is the same as a friendship between Ben and Ann), the matrix is symmetric, meaning $A_{ij} = A_{ji}$ . If the graph were directed (e.g., representing a "follows" relationship), the matrix would not necessarily be symmetric.
Sparsity: This matrix has more zeros than ones. For most large networks, such as social networks or citation networks, nodes are connected to only a small fraction of all other nodes. This results in highly sparse adjacency matrices, which can be inefficient to store. In practice, graph data is often stored in a format like an adjacency list (or COO format), which only lists the pairs of connected nodes. However, the adjacency matrix remains the standard mathematical representation for developing GNN theory.

For weighted graphs, where edges have different strengths (e.g., interaction frequency), the matrix entries $A_{ij}$ would contain the edge weight instead of just 1.

Representing Node Attributes: The Feature Matrix

In most applications, nodes themselves contain useful information. A user in a social network has a profile (age, location). A protein in a biological network has chemical properties. This information is stored in a node feature matrix, commonly denoted as $X$ .

For a graph with $N$ nodes and $F$ features for each node, the feature matrix $X$ has dimensions $N \times F$ . Each row $i$ of the matrix corresponds to node $i$ and contains its feature vector.

Let's assign two features to each person in our example graph: their age and the group they belong to (encoded as 0 or 1).

Node 0 (Ann): Age 25, Group 0
Node 1 (Ben): Age 30, Group 0
Node 2 (Chloe): Age 22, Group 1
Node 3 (Dan): Age 28, Group 1

This information can be organized into a $4 \times 2$ feature matrix $X$ :

X = \begin{pmatrix} 25 & 0 \\ 30 & 0 \\ 22 & 1 \\ 28 & 1 \end{pmatrix}

The first column represents age, and the second represents group membership. This matrix provides the initial state or attributes for each node before any learning occurs. The goal of a GNN is often to use these features, along with the graph structure, to learn more expressive representations of the nodes.

The Complete Picture

Together, the adjacency matrix $A$ and the node feature matrix $X$ provide a complete numerical representation of an attributed graph. They serve as the two primary inputs to nearly all Graph Neural Network models.

$A$ ( $N \times N$ ) tells the model who is connected to whom.
$X$ ( $N \times F$ ) tells the model what each node is like.

The core operation of a GNN, which we will examine in the next chapter, involves using the structure defined by $A$ to propagate and transform the information contained in $X$ . This allows each node to learn from its neighbors, integrating both its own attributes and its local network context.

It is also worth noting that in some graphs, edges may also have features. For example, in a molecular graph, the edges representing chemical bonds can have types (single, double). This information is typically stored in a separate edge feature tensor, adding a third component to the graph's representation. For now, we will focus on the fundamental pairing of $A$ and $X$ .

Was this section helpful?

References

Introduction to Graph Theory, Douglas B. West, 2001 (Prentice Hall) - A classic textbook covering fundamental concepts of graph theory, including adjacency matrices and other graph representations.
Graph Representation Learning, William L. Hamilton, 2020 Synthesis Lectures on Artificial Intelligence and Machine Learning, Vol. 14 (Morgan and Claypool) DOI: 10.2200/S00998ED1V01Y202003AIM007 - An authoritative book focusing on graph representation learning, beginning with the numerical encoding of graph structures and node features.
Dive into Deep Learning, Aston Zhang, Zachary C. Lipton, Mu Li, Alex Smola, and others, 2024 (Cambridge University Press) - An accessible online textbook, with a chapter dedicated to Graph Neural Networks that explains how graphs are numerically represented as input.
A Comprehensive Survey on Graph Neural Networks, Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, 2020 IEEE Transactions on Neural Networks and Learning Systems, Vol. 32 (IEEE) DOI: 10.1109/TNNLS.2020.2970760 - A widely cited survey providing a broad overview of Graph Neural Networks, including the foundational methods for representing graphs as input data.