To process graph data effectively with machine learning models, especially Graph Neural Networks, converting the graph's structure and associated information into suitable numerical formats is essential. The specific representation of the graph profoundly influences how messages are computed and what properties the model can learn. Standard and more advanced ways graphs are encoded for computation are examined here.
The most fundamental way to represent the connectivity of a graph , where is the set of nodes (vertices) and is the set of edges, is the Adjacency Matrix.
For a graph with nodes, the adjacency matrix is an matrix where:
For weighted graphs, can represent the weight of the edge between node and node . For undirected graphs, is symmetric ().
A simple undirected graph with 5 nodes and 5 edges.
For the graph above, the adjacency matrix (assuming nodes are indexed 0 to 4) would be:
"Graphs are often sparse, meaning most entries in are zero. This sparsity is significant for computational efficiency. Storing and performing operations on dense matrices becomes prohibitive for large . Therefore, sparse matrix formats (like Coordinate List (COO), Compressed Sparse Row (CSR), or Compressed Sparse Column (CSC)) are typically used in GNN libraries. We will discuss these formats further in Chapter 5. The adjacency matrix directly informs the neighborhood used in the message passing framework."
While the adjacency matrix captures direct connections, Graph Laplacians encode information about node degrees and graph structure in a way that's particularly useful for analyzing graph properties and forming the basis of spectral GNNs.
First, we define the Degree Matrix . This is an diagonal matrix where is the degree of node (the number of edges connected to it), and all off-diagonal entries are zero.
Using and , we can define several forms of the Graph Laplacian:
Combinatorial Laplacian ():
This is perhaps the most basic form. Its properties are linked to graph connectivity. For example, the number of connected components in the graph is equal to the multiplicity of the eigenvalue 0 of . It's a symmetric positive semi-definite matrix.
Symmetric Normalized Laplacian ():
Here, is the identity matrix, and is the diagonal matrix with entries . This normalization keeps the matrix symmetric and ensures its eigenvalues are bounded between 0 and 2. This bounding is advantageous when designing spectral filters for GNNs, as it prevents numerical instability. Note that computing requires . Nodes with degree 0 might need special handling depending on the application.
Random Walk Normalized Laplacian ():
This version is generally not symmetric. It's closely related to the transition matrix of a random walk on the graph (). The term represents the averaging of neighbor features, weighted inversely by the sending node's degree, which forms the core aggregation step in simple Graph Convolutional Networks (GCNs).
Laplacians capture relational information differently than the adjacency matrix. Their spectral properties (eigenvalues and eigenvectors) reveal significant structural information about the graph, such as connectivity and partitioning, forming the foundation of spectral graph theory and spectral GNNs, which we will explore further in Section 1.3 and Chapter 2.
Graphs in machine learning contexts rarely consist of only structure. Nodes and edges often carry attributes or features that provide valuable information.
Node features are typically represented as an matrix , where is the number of nodes and is the number of features per node. Each row corresponds to the feature vector of node .
These features can be:
The nature and quality of node features are critical for GNN performance. If features are absent, methods like assigning constant values, unique one-hot identifiers, or using structural features might be employed. Feature processing techniques like normalization or standardization are often applied. The initial node feature matrix typically serves as the input in the message passing framework.
Edges can also possess features, especially in contexts like knowledge graphs (where edges represent relations), molecular graphs (bond types), or transportation networks (road types, travel times). These can be represented in various ways, often as a tensor of shape , where is the number of edge features, or integrated directly into the message passing computation based on the specific edge connecting two nodes. Handling edge features requires specific GNN architectures, such as Relational GCN (RGCN), which we will cover in Chapter 4.
The choice of representation (, , , etc.) and the handling of features (, ) directly impact:
Understanding these different ways to encode graph structure and features numerically is the first step towards building and analyzing the advanced GNN architectures discussed in the subsequent chapters. We now have the vocabulary to discuss message passing and spectral methods with greater precision.
Was this section helpful?
© 2026 ApX Machine LearningAI Ethics & Transparency•