To process graph data effectively with machine learning models, especially Graph Neural Networks, we first need to convert the graph's structure and associated information into suitable numerical formats. While Chapter 1 revisited the core message passing idea, the specific representation of the graph profoundly influences how these messages are computed and what properties of the graph the model can learn. Let's examine the standard and more advanced ways graphs are encoded for computation.
The most fundamental way to represent the connectivity of a graph G=(V,E), where V is the set of nodes (vertices) and E is the set of edges, is the Adjacency Matrix.
For a graph with N=∣V∣ nodes, the adjacency matrix A is an N×N matrix where:
Aij={10if (vi,vj)∈EotherwiseFor weighted graphs, Aij can represent the weight of the edge between node vi and node vj. For undirected graphs, A is symmetric (A=AT).
A simple undirected graph with 5 nodes and 5 edges.
For the graph above, the adjacency matrix (assuming nodes are indexed 0 to 4) would be:
A=0110010110110100110100010Real-world graphs are often sparse, meaning most entries in A are zero. This sparsity is significant for computational efficiency. Storing and performing operations on dense N×N matrices becomes prohibitive for large N. Therefore, sparse matrix formats (like Coordinate List (COO), Compressed Sparse Row (CSR), or Compressed Sparse Column (CSC)) are typically used in GNN libraries. We will discuss these formats further in Chapter 5. The adjacency matrix directly informs the neighborhood N(v) used in the message passing framework.
While the adjacency matrix captures direct connections, Graph Laplacians encode information about node degrees and graph structure in a way that's particularly useful for analyzing graph properties and forming the basis of spectral GNNs.
First, we define the Degree Matrix D. This is an N×N diagonal matrix where Dii is the degree of node vi (the number of edges connected to it), and all off-diagonal entries are zero.
Dii=deg(vi)=j∑AijUsing A and D, we can define several forms of the Graph Laplacian:
Combinatorial Laplacian (L):
L=D−AThis is perhaps the most basic form. Its properties are linked to graph connectivity. For example, the number of connected components in the graph is equal to the multiplicity of the eigenvalue 0 of L. It's a symmetric positive semi-definite matrix.
Symmetric Normalized Laplacian (Lsym):
Lsym=D−1/2LD−1/2=I−D−1/2AD−1/2Here, I is the identity matrix, and D−1/2 is the diagonal matrix with entries (Dii)−1/2. This normalization keeps the matrix symmetric and ensures its eigenvalues are bounded between 0 and 2. This bounding is advantageous when designing spectral filters for GNNs, as it prevents numerical instability. Note that computing D−1/2 requires Dii>0. Nodes with degree 0 might need special handling depending on the application.
Random Walk Normalized Laplacian (Lrw):
Lrw=D−1L=I−D−1AThis version is generally not symmetric. It's closely related to the transition matrix of a random walk on the graph (P=D−1A). The term D−1A represents the averaging of neighbor features, weighted inversely by the sending node's degree, which forms the core aggregation step in simple Graph Convolutional Networks (GCNs).
Laplacians capture relational information differently than the adjacency matrix. Their spectral properties (eigenvalues and eigenvectors) reveal significant structural information about the graph, such as connectivity and partitioning, forming the foundation of spectral graph theory and spectral GNNs, which we will explore further in Section 1.3 and Chapter 2.
Graphs in machine learning contexts rarely consist of only structure. Nodes and edges often carry attributes or features that provide valuable information.
Node features are typically represented as an N×F matrix X, where N is the number of nodes and F is the number of features per node. Each row Xv corresponds to the feature vector of node v.
X=−−−x1Tx2T⋮xNT−−−These features can be:
The nature and quality of node features are critical for GNN performance. If features are absent, methods like assigning constant values, unique one-hot identifiers, or using structural features might be employed. Feature processing techniques like normalization or standardization are often applied. The initial node feature matrix X typically serves as the input hv(0) in the message passing framework.
Edges can also possess features, especially in contexts like knowledge graphs (where edges represent relations), molecular graphs (bond types), or transportation networks (road types, travel times). These can be represented in various ways, often as a tensor E of shape ∣E∣×Fe, where Fe is the number of edge features, or integrated directly into the message passing computation based on the specific edge connecting two nodes. Handling edge features requires specific GNN architectures, such as Relational GCN (RGCN), which we will cover in Chapter 4.
The choice of representation (A, L, Lsym, etc.) and the handling of features (X, E) directly impact:
Understanding these different ways to encode graph structure and features numerically is the first step towards building and analyzing the advanced GNN architectures discussed in the subsequent chapters. We now have the vocabulary to discuss message passing and spectral methods with greater precision.
© 2025 ApX Machine Learning