趋近智
Before we can feed a graph into a machine learning model, it's good practice to understand its fundamental characteristics. Just as you might calculate the mean, median, and standard deviation of a feature in a tabular dataset, we can compute properties of a graph to get a quantitative summary of its structure. These measurements help us build intuition about the data and can inform our modeling decisions.
We can start by characterizing the graph from the perspective of its individual nodes. These metrics often quantify a node's importance or role within the network.
The most fundamental node-level property is its degree, which is simply the number of edges connected to it. In a social network, a user's degree is the number of friends they have. For an undirected graph, the degree of a node i, denoted d(i), can be calculated directly from the adjacency matrix A:
d(i)=j=1∑∣V∣AijIn a directed graph, the concept of degree is split into two:
Nodes with a very high degree are often called "hubs" and can play a significant role in how information flows through the network.
A simple graph where node A has a degree of 3, node D has a degree of 2, and nodes B, C, and E each have a degree of 1.
While degree measures direct connectivity, centrality provides a more sophisticated view of a node's structural importance. There are several ways to measure it:
Betweenness Centrality: This measures how often a node acts as a bridge along the shortest path between two other nodes. A node with high betweenness centrality has a large influence over the transfer of items through the network. Removing such a node could disconnect parts of the graph.
Closeness Centrality: This measures the average distance from a node to all other reachable nodes in the graph. A node with high closeness centrality can reach others efficiently, making it a good broadcasting point for information.
Eigenvector Centrality: This is a measure of influence. It posits that a node's importance is determined by the importance of its neighbors. A node connected to many high-degree hubs will have a higher eigenvector centrality than a node connected to many low-degree nodes. Google's PageRank algorithm is a variant of this measure.
Nodes C and D have high betweenness centrality as they form the only bridge between the two clusters of nodes.
Moving from individual nodes, we can also compute properties that describe the graph as a whole.
The most basic properties are the total number of nodes, ∣V∣, and the total number of edges, ∣E∣. These give us a sense of the scale of the graph.
From these, we can calculate the graph density, which measures how many edges exist in the graph compared to the maximum number of possible edges. For an undirected graph, the formula is:
Density=∣V∣(∣V∣−1)2∣E∣A graph with a density close to 1.0 is considered dense, while a graph with a density close to 0 is sparse. Most graphs, from social networks to molecular structures, are extremely sparse. This sparsity is a significant property that GNN libraries use for computational efficiency.
A comparison between a sparse graph, with few connections, and a dense graph, where nodes are highly interconnected.
Connected Components: An undirected graph is connected if there is a path between any two nodes. If it is not connected, it consists of multiple connected components, which are disjoint subgraphs. Identifying these components can be useful; for instance, you might decide to train a separate model for each one.
Average Path Length: This is the average shortest path distance between all pairs of reachable nodes. A small average path length in a large, sparse graph is a hallmark of "small-world" networks, a common phenomenon observed in many social and biological systems.
These properties are not just for descriptive analysis. They can be incorporated directly into machine learning pipelines as features. For example, a node's degree or its centrality score can serve as an input feature to a GNN, providing the model with explicit information about its structural role. In the next section, we will see how to compute these metrics using a popular Python library.
这部分内容有帮助吗?
© 2026 ApX Machine Learning用心打造