Optimizing computational kernels requires precise knowledge of how memory is accessed relative to loop iterations. While general-purpose intermediate representations (IRs) allow arbitrary arithmetic in array indices, this flexibility often prevents compilers from proving that a loop transformation is safe. If a compiler cannot guarantee that reordering loops preserves data dependencies, it must conservatively avoid the optimization. The MLIR Affine dialect addresses this challenge by imposing a restrictive but powerful structural constraint: all control flow and memory accesses must be defined using affine (linear) functions of loop induction variables and constants.This strict mathematical structure enables the use of polyhedral compilation techniques. By modeling loop nests as geometric polytopes, the compiler can apply exact dependency analysis, enabling aggressive transformations like loop tiling, skewing, and vectorization without the risk of changing the program's semantics.Affine Maps and SetsThe fundamental building blocks of this dialect are Affine Maps and Integer Sets. These constructs separate the index calculation logic from the operations themselves, allowing the compiler to analyze memory patterns mathematically.An Affine Map defines a transformation from a list of dimensions and symbols to a list of results. It follows the form:$$ (d_0, d_1, \dots, d_n)[s_0, s_1, \dots, s_m] \rightarrow (r_0, r_1, \dots, r_k) $$Here, $d$ represents dimensions (typically loop induction variables), and $s$ represents symbols (runtime values that remain constant within the scope, such as image width or batch size). The result expressions $r$ must be linear combinations of $d$, $s$, and integer constants.Consider a map used to access a flattened 2D array with a stride $W$:#flatten_map = affine_map<(i, j)[W] -> (i * W + j)>In this definition, i and j are dimensions, and W is a symbol. The compiler can statically analyze this map to determine that incrementing j moves memory access by 1 unit, while incrementing i moves it by W units.Integer Sets serve a similar purpose but define a domain of valid points rather than a transformation. They are used for conditional execution, such as handling boundary conditions in tiled loops. A set consists of a list of affine constraints (equalities or inequalities) that must hold true.$$ S = { (i, j) \mid 0 \le i < N, \ 0 \le j < M, \ i + j \ge 0 } $$Structured Control Flow and Memory AccessThe Affine dialect introduces specific operations for control flow (affine.for, affine.if) and memory manipulation (affine.load, affine.store). Unlike standard control flow, affine.for loops enforce that bounds are affine maps of outer loop variables and symbols.This restriction guarantees that the iteration space forms a convex polyhedron. The following MLIR snippet demonstrates a nested loop computing a matrix addition, where the constraints are explicitly embedded in the IR structure:func.func @matrix_add(%A: memref<128x128xf32>, %B: memref<128x128xf32>, %C: memref<128x128xf32>) { affine.for %i = 0 to 128 { affine.for %j = 0 to 128 { %a = affine.load %A[%i, %j] : memref<128x128xf32> %b = affine.load %B[%i, %j] : memref<128x128xf32> %sum = arith.addf %a, %b : f32 affine.store %sum, %C[%i, %j] : memref<128x128xf32> } } return }In this example, the compiler knows that the memory access pattern is strictly (i, j) -> (i, j). This trivial mapping simplifies dependency analysis. If we were to apply a tiling transformation, the compiler would introduce new maps to handle the intra-tile and inter-tile coordinates.The following diagram illustrates how the Affine dialect structures a loop nest. The iteration space is defined by bounds, and the memory access is derived via an affine map applied to the induction variables.digraph G { rankdir=TB; node [shape=box, style=filled, fillcolor="#f8f9fa", color="#adb5bd", fontname="Helvetica"]; edge [color="#495057", fontname="Helvetica"]; subgraph cluster_loop { label = "Affine Loop Scope"; style = dashed; color = "#adb5bd"; IV [label="Induction Variables\n(%i, %j)", fillcolor="#eebefa"]; Bounds [label="Loop Bounds\n(0 to 128)", fillcolor="#d0bfff"]; Body [label="Loop Body", fillcolor="#ffffff"]; } Params [label="Symbols / Parameters\n(e.g., Array Size)", shape=ellipse, fillcolor="#ffc9c9"]; Map [label="Affine Map\n(d0, d1) -> (d0, d1)", shape=hexagon, fillcolor="#96f2d7"]; Memory [label="Memory Address", fillcolor="#a5d8ff"]; Params -> Bounds; Bounds -> IV [label="Defines Range"]; IV -> Body; IV -> Map [label="Input dims"]; Params -> Map [label="Input symbols"]; Map -> Memory [label="Calculates Index"]; }Structure of an affine loop nest showing how induction variables and symbols feed into an affine map to determine precise memory addresses.Dependency Analysis and Polyhedral OptimizationThe primary advantage of the Affine dialect is its support for precise dependency analysis. In traditional optimization, checking if two pointers alias is often undecidable. In the affine context, data dependence checking is reduced to solving a system of linear inequalities, typically using the integer linear programming (ILP) or Fourier-Motzkin elimination.If the compiler wants to parallelize the loop shown above, it checks if there is any iteration $(i', j')$ that depends on a result computed in iteration $(i, j)$. Since the reads and writes occur at the exact same coordinate $(i, j)$ and there are no cross-iteration dependencies (like C[i, j] = C[i-1, j]), the compiler can mathematically prove that all iterations are independent.This capability is what enables Loop Tiling (or blocking). Tiling partitions the loop iteration space into smaller blocks to fit the working set into the CPU cache or GPU shared memory. In the Affine dialect, tiling is not just a heuristic; it is a coordinate transformation.The chart below visualizes an iteration space for a loop where $i$ ranges from 0 to 16 and $j$ ranges from 0 to 16. The color coding represents a 4x4 tiling strategy. The compiler transforms the original loops into a set of loops that visit these colored blocks sequentially, improving cache locality.{"layout": {"title": "2D Iteration Space with 4x4 Tiling", "xaxis": {"title": "Dimension j", "showgrid": true, "zeroline": false}, "yaxis": {"title": "Dimension i", "showgrid": true, "zeroline": false}, "width": 600, "height": 500, "template": "simple_white"}, "data": [{"x": [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3], "y": [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3], "mode": "markers", "type": "scatter", "marker": {"size": 10, "color": "#4dabf7"}, "name": "Tile (0,0)"}, {"x": [4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7], "y": [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3], "mode": "markers", "type": "scatter", "marker": {"size": 10, "color": "#ffa8a8"}, "name": "Tile (0,1)"}, {"x": [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3], "y": [4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7], "mode": "markers", "type": "scatter", "marker": {"size": 10, "color": "#69db7c"}, "name": "Tile (1,0)"}, {"x": [4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7], "y": [4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7], "mode": "markers", "type": "scatter", "marker": {"size": 10, "color": "#ffd43b"}, "name": "Tile (1,1)"}]}Visualization of a tiled iteration space. The compiler groups execution into blocks (colors) to maximize data reuse, a transformation validated by affine analysis.Progressive LoweringThe Affine dialect is rarely the final target. It acts as a high-level analysis capability. Once optimizations, such as tiling, unroll-and-jam, or vectorization, are applied, the affine constructs are typically lowered to the SCF (Structured Control Flow) dialect or directly to the LLVM dialect.During lowering, the abstract affine.for loops are converted into standard loops with explicit arithmetic for index calculations. The affine.load operations become standard pointer arithmetic. This separation of concerns allows the Affine dialect to focus purely on valid transformation logic without worrying about low-level implementation details like pointer bit-casting or machine-specific intrinsics.You generally use the Affine dialect when you need to perform heavy loop transformations on dense tensors. However, if your computation involves indirect access (e.g., sparse matrices where $A[B[i]]$ is accessed), the Affine dialect cannot represent the pattern, and you must fall back to the more generic SCF or standard dialects, losing the ability to perform polyhedral optimization.