A machine learning model is fundamentally a series of mathematical transformations applied to data. While your Python code might execute these transformations sequentially line-by-line, the underlying logic often permits a more flexible execution strategy. To exploit this flexibility, ML compilers convert linear source code into a Dataflow Graph. This graph explicitly maps out operations and the dependencies between them, providing the compiler with a complete view of how data moves through the system.Anatomy of a Dataflow GraphIn the context of compiler Intermediate Representations (IR), a dataflow graph is typically structured as a Directed Acyclic Graph (DAG). This structure consists of two primary elements:Nodes (Vertices): These represent the computational operations or "ops". In a deep learning context, a node could be a low-level primitive like an addition or multiplication, or a high-level operator like a 2D Convolution or Softmax.Edges (Links): These represent the flow of data. An edge connecting Node A to Node B indicates that the output of Node A is a required input for Node B. These edges carry tensor data, including shape and type information.Unlike a standard control-flow graph used in general-purpose compilers (which focuses on the order of instructions and jumps), a dataflow graph focuses on the availability of data. An operation in this graph becomes executable the moment all its input edges have valid data.Consider a simple linear layer computation: $Y = (X \times W) + b$.In Python, this looks like a sequence of variable assignments. In the IR, it is a graph where $X$ and $W$ flow into a Matrix Multiplication node, the result flows into an Addition node, and $b$ flows into that same Addition node as the second argument.digraph G { rankdir=TB; bgcolor="#ffffff"; node [style=filled, shape=box, fontname="Arial", fontsize=10, margin=0.2]; edge [fontname="Arial", fontsize=9, color="#adb5bd"]; Input_X [label="Input X", fillcolor="#a5d8ff", color="#1c7ed6"]; Input_W [label="Weights W", fillcolor="#a5d8ff", color="#1c7ed6"]; Input_b [label="Bias b", fillcolor="#a5d8ff", color="#1c7ed6"]; MatMul [label="MatMul", fillcolor="#ffc9c9", color="#fa5252"]; Add [label="Add", fillcolor="#ffc9c9", color="#fa5252"]; Output [label="Output Y", fillcolor="#b2f2bb", color="#40c057"]; Input_X -> MatMul; Input_W -> MatMul; MatMul -> Add; Input_b -> Add; Add -> Output; }A dataflow graph representing a linear layer operation. The structure explicitly shows that the bias addition cannot occur until the matrix multiplication completes.Dependencies and Execution OrderThe primary purpose of the dataflow graph is to enforce correctness through dependencies. A dependency exists when one operation relies on the result of another. These are strict constraints: the compiler cannot reorder operations in a way that violates these edges.However, the graph also reveals where dependencies do not exist. This is the most valuable insight for optimization. If two nodes do not have a directed path connecting them, they are independent.Serial vs. Parallel ExecutionIn an imperative script, independent operations are often written sequentially simply because the syntax requires it.# Python execution is serial x = load_image() feat_a = extract_features_method_A(x) # Step 1 feat_b = extract_features_method_B(x) # Step 2 result = concatenate(feat_a, feat_b) # Step 3To the interpreter, Step 1 must finish before Step 2 begins. However, the dataflow graph reveals that feat_a and feat_b both depend only on x. There is no dependency between extract_features_method_A and extract_features_method_B.This lack of dependency allows the compiler to schedule these operations to run in parallel on hardware that supports it (like a GPU with multiple streams or a multi-core CPU). The graph transforms the program from a strict sequence into a set of opportunities.digraph G { rankdir=TB; bgcolor="#ffffff"; node [style=filled, shape=rect, fontname="Arial", fontsize=10]; edge [color="#adb5bd"]; Input [label="Input Image", fillcolor="#eebefa", color="#be4bdb"]; BranchA [label="Conv2D (3x3)", fillcolor="#91a7ff", color="#4c6ef5"]; BranchB [label="Conv2D (5x5)", fillcolor="#91a7ff", color="#4c6ef5"]; Concat [label="Concatenate", fillcolor="#63e6be", color="#12b886"]; Input -> BranchA; Input -> BranchB; BranchA -> Concat; BranchB -> Concat; }This branching structure illustrates independent paths. The two convolution operations share an input but function independently, allowing the compiler to schedule them concurrently.Single Static Assignment (SSA)Most modern ML compilers (including TVM, XLA, and MLIR) utilize a form of Intermediate Representation known as Single Static Assignment (SSA). In SSA, every tensor is assigned exactly once. You cannot modify a tensor in place within the graph definition; instead, any operation that appears to modify data actually produces a new tensor version.This immutability simplifies the analysis of dependencies. The compiler does not need to track complex state changes or worry about memory aliasing (where two variables point to the same memory location) during the graph analysis phase. If Node B uses Tensor $T_1$, and we want to optimize the graph by changing how $T_1$ is produced, we can do so knowing exactly which downstream nodes are affected.From Graph to ScheduleOnce the dependencies are mapped, the compiler performs a topological sort to determine a valid linear execution order. A topological sort is an ordering of nodes such that for every directed edge from Node A to Node B, Node A appears before Node B in the ordering.For a complex graph, there are often multiple valid topological sorts.Depth-First: Prioritizes finishing one branch of computation before starting another. This can reduce peak memory usage by quickly freeing up intermediate tensors.Breadth-First: Prioritizes executing all available nodes at the current depth. This can maximize parallelism but may increase memory footprint as more intermediate results are kept alive simultaneously.Compiler optimizations often involve manipulating the graph structure, fusing nodes or rewriting subgraphs, while maintaining the validity of these data dependencies. For instance, if a node is removed (Dead Code Elimination), the compiler must ensure that no other nodes depended on its output. If nodes are fused, the dependencies of the new fused node are the union of the dependencies of the original nodes.Understanding the dataflow graph is the first step in seeing code not as text, but as a structured network of operations. This perspective is what enables the powerful transformations we will discuss in the following chapters on graph-level optimization.