Modern deep learning compilers distinguish themselves from traditional language compilers by managing two distinct phases of abstraction. While a standard C++ compiler might lower source code directly into an instruction-level intermediate representation (IR) like LLVM IR, an AI compiler must first reason about the structure of the mathematics before it worries about the structure of the assembly code. This necessity gives rise to a multi-level IR architecture, split primarily into Graph-Level (High-Level) IR and Kernel-Level (Low-Level) IR.The Graph-Level Intermediate RepresentationThe High-Level IR is the compiler's view of the model immediately after parsing the frontend framework code (such as PyTorch or TensorFlow). At this stage, the representation remains declarative. The compiler knows what needs to be computed but has not yet decided how to compute it.In a Graph-Level IR, the fundamental unit of data is the Tensor. The compiler tracks properties such as data type (float32, int8), shape (dimensions), and memory layout (NCHW vs NHWC). The fundamental unit of execution is the Operator. A single node in this graph, such as Conv2D or Softmax, represents a massive amount of arithmetic operations.Consider a typical layer in a transformer model. In High-Level IR, a fully connected layer followed by an activation function appears as a minimal set of nodes. The complexity of the underlying loops is hidden behind the operator abstraction.digraph G { rankdir=TB; node [shape=box, style=filled, fontname="Helvetica", fontsize=10]; edge [fontname="Helvetica", fontsize=9, color="#868e96"]; input [label="Input Tensor\n[Batch, 128]", fillcolor="#a5d8ff", color="#228be6"]; weights [label="Weights\n[128, 512]", fillcolor="#ffe066", color="#f59f00"]; bias [label="Bias\n[512]", fillcolor="#ffe066", color="#f59f00"]; matmul [label="Op: MatMul", fillcolor="#ffc9c9", color="#fa5252"]; add [label="Op: Add", fillcolor="#ffc9c9", color="#fa5252"]; relu [label="Op: ReLU", fillcolor="#ffc9c9", color="#fa5252"]; output [label="Output Tensor\n[Batch, 512]", fillcolor="#b2f2bb", color="#40c057"]; input -> matmul; weights -> matmul; matmul -> add [label="Intermediate\n[Batch, 512]"]; bias -> add; add -> relu; relu -> output; }A high-level graph representation of a dense layer. The nodes represent logical mathematical operations rather than CPU instructions.The primary utility of this level is global optimization. Because the compiler sees the entire topology of the neural network, it can perform algebraic simplifications and structural changes. For example, if the compiler encounters the sequence:$$Y = \text{ReLU}(\text{Add}(\text{Conv2D}(X, W), B))$$It does not need to worry about register allocation or cache lines yet. It focuses on Operator Fusion, asking whether these three operations can be merged into a single kernel call to minimize memory round-trips to global memory (VRAM).The Lowering ProcessThe transition from High-Level to Low-Level IR is often referred to as "lowering." This is a destructive process regarding semantics. Once a Conv2D node is lowered, the compiler loses the semantic knowledge that it is performing a convolution. Instead, the operation is replaced by its implementation details: nested loops, memory loads, multiply-accumulate instructions, and stores.This distinction is important because certain optimizations are only possible at specific levels. You cannot effectively perform operator fusion once the code has been shattered into loops, as the structure is too obscured. Conversely, you cannot perform loop tiling or vectorization at the graph level because loops do not formally exist there yet.The Kernel-Level Intermediate RepresentationLow-Level IR resembles a restricted subset of C or assembly language. It is imperative rather than declarative. At this stage, the abstraction shifts from Tensors to Buffers (blocks of allocated memory with pointers) and from Operators to Loop Nests.In frameworks like Apache TVM (specifically the Tensor Intermediate Representation, or TIR) or MLIR's Affine/Linalg dialects, the Low-Level IR makes explicit everything that was implicit in the graph. The compiler must now manage:Iteration Domains: The bounds of loops ($for , i = 0 , to , N$).Memory Access: Calculating specific indices into flat memory buffers ($A[i \times stride + j]$).Scope and Storage: Determining whether data resides in global memory, shared memory (GPU), or registers.Consider the same matrix multiplication from the graph example. In Low-Level IR, it is expanded into an iterative structure amenable to hardware-specific tuning.digraph LowLevel { rankdir=TB; node [shape=note, style=filled, fontname="Courier", fontsize=10]; edge [color="#adb5bd"]; alloc [label="Allocate Buffer C[M*N]", fillcolor="#eebefa", color="#be4bdb"]; loop_i [label="For i = 0 to M", fillcolor="#d0bfff", color="#7950f2"]; loop_j [label="For j = 0 to N", fillcolor="#d0bfff", color="#7950f2"]; loop_k [label="For k = 0 to K", fillcolor="#d0bfff", color="#7950f2"]; compute [label="Reg_C += A[i,k] * B[k,j]", fillcolor="#ffc9c9", color="#fa5252"]; store [label="Store C[i,j] <- Reg_C", fillcolor="#99e9f2", color="#15aabf"]; alloc -> loop_i; loop_i -> loop_j; loop_j -> loop_k; loop_k -> compute; loop_j -> store [label="Post-reduction"]; }Representation of the same operation after lowering. The focus shifts to iteration order, memory allocation, and scalar computation.This level is where the "heavy lifting" of performance engineering occurs. The compiler attempts to reshape these loop nests to maximize data locality. Techniques such as Loop Tiling (breaking large loops into smaller blocks to fit in cache) and Vectorization (using SIMD instructions like AVX-512 or NEON) are applied here.Comparative Analysis of IR CharacteristicsTo architect an effective compiler pass, one must identify which IR level offers the correct primitives for the task. Attempting to perform memory planning at the graph level is imprecise because buffer lifespans are not fully determined. Attempting to perform dead code elimination at the loop level is inefficient compared to pruning the graph earlier.The following breakdown illustrates the functional differences between these two abstraction layers.{ "data": [ { "type": "table", "header": { "values": [ "<b>Feature</b>", "<b>High-Level IR (Graph)</b>", "<b>Low-Level IR (Kernel)</b>" ], "align": "left", "fill": { "color": "#e9ecef" }, "font": { "family": "Arial", "size": 12, "color": "#495057" } }, "cells": { "values": [ [ "Data Unit", "Operations", "Control Flow", "Memory Model", "Optimization Focus", "Example Systems" ], [ "Tensor (Shape + Type)", "Logical (Conv2D, MatMul)", "Data Dependency Edges", "Abstract / Implicit", "Fusion, Layout, Simplification", "TVM Relay, XLA HLO, TF Graph" ], [ "Buffer (Pointer + Offset)", "Scalar (Load, Store, FMA)", "Loops, If/Else, Jumps", "Explicit (Alloc/Free)", "Tiling, Vectorization, Unrolling", "TVM TIR, MLIR Affine, LLVM IR" ] ], "align": "left", "fill": { "color": [ "#f8f9fa", "#ffffff", "#f8f9fa" ] }, "font": { "family": "Arial", "size": 11, "color": "#212529" }, "height": 30 } } ], "layout": { "title": { "text": "Comparison of IR Abstraction Layers", "font": { "size": 16 } }, "margin": { "l": 20, "r": 20, "t": 40, "b": 20 }, "height": 350 } }Functional comparison distinguishing the scope and capabilities of graph-level versus kernel-level representations.The Gray Area: Multi-Level DialectsIn modern infrastructures like MLIR (Multi-Level Intermediate Representation), the boundary between high and low levels is becoming less rigid. Instead of two monolithic states, the compiler may utilize a progressive lowering strategy involving several dialects.For instance, the Linalg dialect in MLIR sits arguably in the middle. It represents operations like matrix multiplication effectively (retaining the semantic intent) but operates on buffers suitable for loop analysis. This allows the compiler to perform loop fusion, a technique usually reserved for loop-level IR, using graph-level information. This hybrid approach allows the compiler to make tiling decisions while still aware of the broader operator context, solving the "phase ordering" problem where optimizations at one level inadvertently pessimize the code for the next.Understanding this dichotomy is essential for the practical session that follows. When inspecting TVM's Relay IR, you will see the declarative graph. As we move into later chapters on scheduling, you will see how that graph is lowered into the imperative TIR to be tuned for specific hardware constraints.