Graph data often involves entities and connections of different kinds. Think of a bibliographic network with 'author', 'paper', and 'venue' nodes, connected by 'writes', 'cites', and 'published_in' edges. Standard Graph Neural Networks, designed primarily for homogeneous graphs (single node and edge type), struggle to effectively capture the rich semantics embedded within these diverse relationships. Applying a single message passing function across all edge types implicitly treats them as identical, which is often an oversimplification. Handling heterogeneous graphs requires architectures specifically designed to differentiate and leverage the unique characteristics of various node and edge types.
This section introduces two prominent GNN architectures developed for heterogeneous graphs: Relational Graph Convolutional Networks (RGCN) and Heterogeneous Attention Networks (HAN).
In a homogeneous graph , a standard GNN layer updates a node 's representation by aggregating messages from its neighbors :
Here, is a shared weight matrix for layer , and is a normalization constant. The core issue in heterogeneous graphs, like , where and are sets of node and edge types, is that a single cannot adequately model the distinct transformations associated with different relation types . For example, the way an 'author' node influences a 'paper' node via a 'writes' edge is semantically different from how a 'paper' node influences another 'paper' node via a 'cites' edge.
A simple heterogeneous graph illustrating different node types (Author, Paper, Venue) and edge types (writes, cites, published_in).
RGCNs directly address heterogeneity by introducing relation-specific transformation matrices. For each relation type , a unique weight matrix is learned. The message passing update for a node in an RGCN layer becomes:
where is the set of neighbors of node connected by an edge of type , is the weight matrix for relation at layer , is a weight matrix for the self-connection (optional but common), and is a relation-specific normalization constant (e.g., ).
This formulation allows the model to learn distinct transformations based on the type of relationship connecting two nodes.
A significant practical challenge with RGCNs arises when the number of relation types is large. Learning a separate dense matrix for each relation can lead to a massive number of parameters, increasing the risk of overfitting, especially with limited training data per relation type.
To mitigate this, RGCNs often employ regularization techniques:
Basis Decomposition: The relation-specific matrices are constrained to be linear combinations of a smaller set of shared basis transformations :
Here, only the basis matrices and the scalar coefficients need to be learned, drastically reducing the parameter count if .
Block-Diagonal Decomposition: Each is structured as a block-diagonal matrix:
Each is a smaller dense matrix. This creates sparsity in the weight matrices and reduces parameters, essentially decomposing the feature space into lower-dimensional subspaces where relations operate independently.
These techniques allow RGCNs to scale to graphs with numerous relation types while maintaining model capacity. RGCNs are particularly effective for tasks like node classification and link prediction in knowledge graphs.
HAN takes a different approach, utilizing attention mechanisms to automatically learn the importance of different neighbors and relation types (or, more generally, meta-paths). It employs a hierarchical attention structure:
Node-Level Attention: For a specific meta-path (a sequence of relation types connecting node types, e.g., Author Paper Paper), HAN first learns attention weights for neighbors reachable via that meta-path. Given a target node , for each neighbor connected through meta-path , an attention weight is calculated, similar to Graph Attention Networks (GAT), but specific to . This allows the model to prioritize more relevant neighbors within the context of that meta-path. The node features are transformed (potentially using type-specific matrices) and then aggregated using these attention weights to get a meta-path-specific embedding .
where are neighbors reachable via meta-path , and is a transformation matrix associated with the meta-path.
Semantic-Level Attention: Since different meta-paths capture different semantic aspects and may have varying relevance for a given task and node, HAN introduces a second level of attention. It learns attention weights for each meta-path considered. These weights reflect the importance of each semantic view captured by the meta-paths. The final node embedding is obtained by combining the meta-path-specific embeddings weighted by the semantic attention scores:
The semantic attention weights are typically computed based on the meta-path embeddings themselves, often involving projecting them and using a softmax normalization.
Meta-paths are sequences of node and edge types (e.g., Author-Paper-Author, Paper-Venue-Paper) that define composite relations in a heterogeneous graph. They are often predefined based on domain knowledge and are central to HAN's operation, guiding the neighborhood sampling and attention calculations. The choice of meta-paths significantly influences HAN's performance.
HAN's strength lies in its ability to adaptively weight information from different structural and semantic contexts defined by meta-paths, making it effective for node classification tasks where complex relational patterns are important.
RGCNConv, HGTConv which builds on similar principles for heterogeneity) and DGL (e.g., RelGraphConv, modules for heterogeneous graph handling).Choosing between RGCN and HAN often depends on the specific graph structure, the nature of the task, and whether meaningful meta-paths can be easily defined. Both represent significant advancements in applying GNNs to more complex, realistic graph data.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with