All Courses

Graph Neural Network (GNN) Basics with TF

Traditional neural networks, like CNNs and RNNs, excel at processing data with grid-like structures (images) or sequential patterns (text, time series). However, many real-world datasets are better represented as graphs, networks of interconnected entities. Examples include social networks, molecular structures, knowledge graphs, and recommendation systems. Graph Neural Networks (GNNs) are a class of deep learning models designed specifically to operate on graph-structured data.

This section introduces the fundamental principles behind GNNs and outlines how you can begin implementing them using TensorFlow's core APIs. We assume you are familiar with graph theory basics: nodes (vertices), edges (links), and associated features.

Representing Graphs in TensorFlow

Before building a GNN, we need a way to represent graph data numerically. Common representations include:

Adjacency Matrix ( $A$ ): A square matrix where $A_{ij} = 1$ if an edge exists between node $i$ and node $j$ , and $0$ otherwise. For weighted graphs, $A_{ij}$ can represent the edge weight. Adjacency matrices can become very large and sparse for real-world graphs. tf.SparseTensor is often suitable for representing sparse adjacency matrices efficiently.
Node Feature Matrix ( $X$ ): A matrix where each row $X_i$ represents the feature vector of node $i$ . This is typically a dense tf.Tensor.
Edge List: A list of tuples $(i, j)$ representing edges connecting node $i$ to node $j$ . This can be represented as a tf.Tensor of shape [num_edges, 2]. Edge features can be stored in a parallel tensor.

Here's a simple visualization of a graph:

A small undirected graph with four nodes (A, B, C, D) and several edges connecting them. Each node might have its own features (e.g., user profile information, atom type), and edges could also have features (e.g., relationship type, bond strength).

The Core Idea: Message Passing

Most GNNs operate based on a principle called message passing or neighborhood aggregation. The intuition is that a node's representation should be influenced by the representations of its neighbors. This process typically happens iteratively across layers. In each layer, a node:

Gathers: Collects information (feature vectors or "messages") from its direct neighbors.
Aggregates: Combines the collected information using an aggregation function (like sum, mean, or max).
Updates: Updates its own representation based on its aggregated neighborhood information and its previous representation, often using a neural network layer.

After $k$ layers of message passing, a node's representation incorporates information from its $k$ -hop neighborhood.

A Basic GNN Layer Formulation

Let $h_v^{(l)}$ be the feature vector (embedding) of node $v$ at layer $l$ . A simplified GNN layer update rule can be expressed as:

h_v^{(l+1)} = \sigma \left( W^{(l)} \cdot \text{AGGREGATE} \left( \{ h_u^{(l)} \mid u \in \mathcal{N}(v) \} \right) + B^{(l)} \cdot h_v^{(l)} \right)

Where:

$\mathcal{N}(v)$ is the set of neighbors of node $v$ .
$\text{AGGREGATE}$ is a permutation-invariant function (e.g., sum, mean, max) that combines neighbor representations.
$W^{(l)}$ and $B^{(l)}$ are learnable weight matrices for transforming aggregated neighbor features and the node's own features, respectively. Often, these are combined into a single transformation applied to the concatenation or sum of aggregated neighbor features and self-features.
$\sigma$ is a non-linear activation function (e.g., ReLU).

Different choices for the AGGREGATE and UPDATE functions lead to various GNN architectures like Graph Convolutional Networks (GCN), GraphSAGE, or Graph Attention Networks (GAT).

Implementing GNN Layers in TensorFlow

You can implement GNN layers by subclassing tf.keras.layers.Layer. The core challenge lies in efficiently implementing the gather and aggregate steps using TensorFlow operations.

Let's consider implementing a simple GCN-style layer. Assume we have node features X (shape [num_nodes, input_dim]) and a sparse adjacency matrix A_sparse (representing $\hat{A} = A + I$ , the adjacency matrix with self-loops added).

A simplified GCN layer propagation rule is:

H^{(l+1)} = \sigma(\hat{D}^{-1/2} \hat{A} \hat{D}^{-1/2} H^{(l)} W^{(l)})

Where $\hat{D}$ is the diagonal degree matrix of $\hat{A}$ . This normalization prevents exploding/vanishing gradients.

In TensorFlow, this might involve:

Calculating the normalized adjacency matrix $\hat{D}^{-1/2} \hat{A} \hat{D}^{-1/2}$ . This often requires operations on the sparse tensor A_sparse to compute degrees and apply normalization.
Performing the neighborhood aggregation via sparse matrix multiplication: tf.sparse.sparse_dense_matmul(normalized_A_sparse, H_l).
Applying the linear transformation and activation: activation(tf.matmul(aggregated_features, W_l)).

Here's a structure for a custom GNN layer:

import tensorflow as tf

class SimpleGNNLayer(tf.keras.layers.Layer):
    def __init__(self, output_dim, activation='relu', **kwargs):
        super().__init__(**kwargs)
        self.output_dim = output_dim
        self.activation = tf.keras.activations.get(activation)
        # Potentially initialize weight matrices W and B here
        # self.kernel = self.add_weight(...)
        # self.bias = self.add_weight(...)

    def build(self, input_shape):
        # Initialize weights based on input shape (node feature dimension)
        node_feature_shape = input_shape[0] # Assuming input is (node_features, adj_info)
        input_dim = node_feature_shape[-1]
        self.kernel = self.add_weight(
            shape=(input_dim, self.output_dim),
            initializer='glorot_uniform',
            name='kernel')
        # Add other weights if needed (e.g., for transforming self-features separately)

    def call(self, inputs):
        node_features, adjacency_info = inputs # e.g., adj_info could be a sparse tensor

        # 1. Aggregate neighbor features
        # This step depends heavily on the chosen GNN variant and graph representation
        # Example using sparse matmul (if adj_info is normalized sparse adjacency)
        # aggregated_neighbors = tf.sparse.sparse_dense_matmul(adjacency_info, node_features)

        # Placeholder for a generic aggregation logic
        # You would implement specific gather/aggregate logic here
        aggregated_neighbors = self._aggregate(node_features, adjacency_info)

        # 2. Transform aggregated features (and potentially self-features)
        transformed_features = tf.matmul(aggregated_neighbors, self.kernel)

        # Optional: Include self-features (e.g., add another transformation for node_features)
        # transformed_self = tf.matmul(node_features, self.self_kernel)
        # combined = transformed_neighbors + transformed_self

        # 3. Apply activation
        output = self.activation(transformed_features)
        return output

    def _aggregate(self, node_features, adjacency_info):
        # Implement the specific aggregation logic here based on adjacency_info format
        # Example: If adjacency_info is a normalized sparse matrix:
        if isinstance(adjacency_info, tf.SparseTensor):
             return tf.sparse.sparse_dense_matmul(adjacency_info, node_features)
        # Add other aggregation methods based on edge lists, dense matrices, etc.
        else:
             # Fallback or error for unsupported format
             # For simplicity, just return node_features (no actual aggregation)
             print("Warning: Aggregation placeholder used. Implement specific logic.")
             return node_features

    def get_config(self):
        config = super().get_config()
        config.update({
            'output_dim': self.output_dim,
            'activation': tf.keras.activations.serialize(self.activation)
        })
        return config

This basic structure highlights the components: defining weights (build), implementing the forward pass (call) including aggregation and transformation. The actual implementation of _aggregate is specific to the GNN variant and data representation.

Libraries for GNNs in TensorFlow

While implementing GNNs from scratch using core TensorFlow provides deep understanding, several libraries simplify this process:

TensorFlow Geometric (tf_geometric): A library specifically designed for GNNs in TensorFlow, offering implementations of many popular layers and benchmark datasets.
Spektral: Another library providing Keras layers for GNNs, focusing on ease of use and integration within the Keras ecosystem.

These libraries abstract away much of the low-level implementation detail, allowing you to build complex GNN models more rapidly. However, understanding the underlying principles discussed here remains important for customization and troubleshooting.

GNNs represent a powerful tool for learning from graph-structured data. By understanding how to represent graphs and implement the core message-passing mechanism in TensorFlow, you can start applying these techniques to a wide range of problems where relationships and connections between data points are significant.

Was this section helpful?