Standard deep learning models like Multi-Layer Perceptrons (MLPs), Convolutional Neural Networks (CNNs), and Recurrent Neural Networks (RNNs) have achieved remarkable success on tasks involving images, text, and tabular data. However, their core architectural assumptions break down when applied directly to graph-structured data. The unique properties of graphs present several fundamental challenges that these conventional architectures are not equipped to handle.
Consider a basic MLP, or a fully-connected dense network. It is designed to accept a fixed-size feature vector as input. This presents an immediate problem: graphs are not of a fixed size. A social network can grow from 1,000 to 1,001 users, or a molecular database might contain compounds with different numbers of atoms. There is no straightforward way to create a single, fixed-size input vector for a collection of graphs that vary in their number of nodes and edges.
Even more significant is the problem of node ordering. Graphs have no canonical order. The two diagrams below represent the exact same graph structure, but the nodes are listed in a different order.
A simple directed graph structure.
If we represent this graph with an adjacency matrix, the matrix's appearance depends entirely on the arbitrary order we choose for the nodes. For an ordering of (A, B, C), the adjacency matrix is:
However, if we choose the ordering (B, C, A), we get a completely different matrix, :
An MLP would treat the flattened versions of and as completely different inputs, failing to recognize they describe the same object. A model for graphs must be permutation invariant (for graph-level tasks) or permutation equivariant (for node-level tasks), meaning that its output should not depend on the arbitrary ordering of nodes. Standard MLPs lack this property entirely.
Convolutional Neural Networks have become the standard for computer vision tasks because they are designed to exploit the spatial locality of pixels in a grid. A core operation in a CNN is the convolution, where a small, learnable filter (or kernel) slides across the image. This works because every pixel's neighborhood is a regular, fixed-size grid (e.g., 3x3 or 5x5).
Graphs do not share this grid-like regularity. A node's local neighborhood is defined by its connections, and its size can vary dramatically. One node in a citation network may be connected to two others, while a foundational paper might be connected to two thousand. There is no concept of "up," "down," "left," or "right," and no way to apply a fixed-size filter that makes sense for every node.
The regular, grid-like neighborhood processed by CNNs (left) versus the irregular, variable-sized neighborhood of a node in a graph (right).
Forcing a graph into a grid format would discard the precise relational information that defines its structure, which is often the most important information we want the model to learn from.
Recurrent Neural Networks are built for sequences where order carries meaning, such as words in a sentence or events in a time series. One might attempt to apply an RNN to a graph by performing a "walk" to generate a sequence of nodes. However, just as there is no canonical node ordering, there is no canonical path through a graph.
Starting a walk from a different node, or choosing a different path at an intersection, produces a completely different sequence. Feeding these varied sequences into an RNN would yield inconsistent representations for the exact same graph, making it impossible for the model to learn stable and meaningful patterns.
In summary, these limitations all point to a central challenge: standard neural networks are not designed to process the explicit relational structure encoded in a graph's topology. An MLP discards this structure by flattening it, a CNN requires a regular grid that graphs lack, and an RNN imposes a sequential order that is artificial. To perform machine learning on graphs effectively, we need a different class of models that operates directly on the graph, treating nodes and their connections as fundamental components of the computation. This is precisely the function of Graph Neural Networks.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with