Real-world graph data often involves entities and connections of different kinds. Think of a bibliographic network with 'author', 'paper', and 'venue' nodes, connected by 'writes', 'cites', and 'published_in' edges. Standard Graph Neural Networks, designed primarily for homogeneous graphs (single node and edge type), struggle to effectively capture the rich semantics embedded within these diverse relationships. Applying a single message passing function across all edge types implicitly treats them as identical, which is often an oversimplification. Handling heterogeneous graphs requires architectures specifically designed to differentiate and leverage the unique characteristics of various node and edge types.
This section introduces two prominent GNN architectures developed for heterogeneous graphs: Relational Graph Convolutional Networks (RGCN) and Heterogeneous Attention Networks (HAN).
In a homogeneous graph G=(V,E), a standard GNN layer updates a node v's representation hv by aggregating messages from its neighbors N(v):
hv(l+1)=σu∈N(v)∪{v}∑cuv1W(l)hu(l)Here, W(l) is a shared weight matrix for layer l, and cuv is a normalization constant. The core issue in heterogeneous graphs, like G′=(V,E,TV,TE), where TV and TE are sets of node and edge types, is that a single W(l) cannot adequately model the distinct transformations associated with different relation types r∈TE. For example, the way an 'author' node influences a 'paper' node via a 'writes' edge is semantically different from how a 'paper' node influences another 'paper' node via a 'cites' edge.
A simple heterogeneous graph illustrating different node types (Author, Paper, Venue) and edge types (writes, cites, published_in).
RGCNs directly address heterogeneity by introducing relation-specific transformation matrices. For each relation type r∈TE, a unique weight matrix Wr(l) is learned. The message passing update for a node v in an RGCN layer becomes:
hv(l+1)=σWself(l)hv(l)+r∈TE∑u∈Nr(v)∑cuv,r1Wr(l)hu(l)where Nr(v) is the set of neighbors of node v connected by an edge of type r, Wr(l) is the weight matrix for relation r at layer l, Wself(l) is a weight matrix for the self-connection (optional but common), and cuv,r is a relation-specific normalization constant (e.g., ∣Nr(v)∣).
This formulation allows the model to learn distinct transformations based on the type of relationship connecting two nodes.
A significant practical challenge with RGCNs arises when the number of relation types ∣TE∣ is large. Learning a separate dense matrix Wr(l) for each relation can lead to a massive number of parameters, increasing the risk of overfitting, especially with limited training data per relation type.
To mitigate this, RGCNs often employ regularization techniques:
Basis Decomposition: The relation-specific matrices Wr(l) are constrained to be linear combinations of a smaller set of shared basis transformations Bi(l):
Wr(l)=i=1∑Bari(l)Bi(l)Here, only the basis matrices Bi(l) and the scalar coefficients ari(l) need to be learned, drastically reducing the parameter count if B≪∣TE∣.
Block-Diagonal Decomposition: Each Wr(l) is structured as a block-diagonal matrix:
Wr(l)=diag(Wr,1(l),Wr,2(l),...,Wr,B(l))Each Wr,i(l) is a smaller dense matrix. This creates sparsity in the weight matrices and reduces parameters, essentially decomposing the feature space into lower-dimensional subspaces where relations operate independently.
These techniques allow RGCNs to scale to graphs with numerous relation types while maintaining model capacity. RGCNs are particularly effective for tasks like node classification and link prediction in knowledge graphs.
HAN takes a different approach, utilizing attention mechanisms to automatically learn the importance of different neighbors and relation types (or, more generally, meta-paths). It employs a hierarchical attention structure:
Node-Level Attention: For a specific meta-path Φ (a sequence of relation types connecting node types, e.g., Author writes Paper cites Paper), HAN first learns attention weights for neighbors reachable via that meta-path. Given a target node v, for each neighbor u connected through meta-path Φ, an attention weight avuΦ is calculated, similar to Graph Attention Networks (GAT), but specific to Φ. This allows the model to prioritize more relevant neighbors within the context of that meta-path. The node features are transformed (potentially using type-specific matrices) and then aggregated using these attention weights to get a meta-path-specific embedding zvΦ.
zvΦ=σu∈NvΦ∑avuΦ⋅WΦhuwhere NvΦ are neighbors reachable via meta-path Φ, and WΦ is a transformation matrix associated with the meta-path.
Semantic-Level Attention: Since different meta-paths capture different semantic aspects and may have varying relevance for a given task and node, HAN introduces a second level of attention. It learns attention weights wΦ for each meta-path Φ1,Φ2,...,ΦP considered. These weights reflect the importance of each semantic view captured by the meta-paths. The final node embedding hv′ is obtained by combining the meta-path-specific embeddings zvΦi weighted by the semantic attention scores:
hv′=i=1∑PwΦi⋅zvΦiThe semantic attention weights wΦi are typically computed based on the meta-path embeddings themselves, often involving projecting them and using a softmax normalization.
Meta-paths are sequences of node and edge types (e.g., Author-Paper-Author, Paper-Venue-Paper) that define composite relations in a heterogeneous graph. They are often predefined based on domain knowledge and are central to HAN's operation, guiding the neighborhood sampling and attention calculations. The choice of meta-paths significantly influences HAN's performance.
HAN's strength lies in its ability to adaptively weight information from different structural and semantic contexts defined by meta-paths, making it effective for node classification tasks where complex relational patterns are important.
RGCNConv
, HGTConv
which builds on similar principles for heterogeneity) and DGL (e.g., RelGraphConv
, modules for heterogeneous graph handling).Choosing between RGCN and HAN often depends on the specific graph structure, the nature of the task, and whether meaningful meta-paths can be easily defined. Both represent significant advancements in applying GNNs beyond simple, homogeneous structures, enabling the analysis of richer, more realistic graph data.
© 2025 ApX Machine Learning