A primary distinction in machine learning compilation is whether the dimensions of the tensors are known at compile time or determined at runtime. This distinction defines the strategy for memory management and the specific optimizations the compiler can apply to the generated code.The Case for Static ShapesIn a static shape regime, every dimension of every tensor in the computation graph is a fixed integer known before the program runs. This is common in computer vision models. For instance, a standard ResNet50 implementation often expects an input of exactly $(1, 3, 224, 224)$.When the compiler encounters static shapes, it operates with complete information. It can calculate the exact memory footprint required for the entire inference process. Instead of allocating and freeing memory dynamically during execution, the compiler can create a static memory plan. It pre-calculates offsets for every tensor in a single contiguous block of memory, often referred to as a memory arena.Furthermore, static shapes enable aggressive loop optimizations. If a loop iterates over a dimension of size 64, the compiler can unroll that loop or map it precisely to a vector unit of width 8 or 16. It does not need to insert boundary checks or handle remainder loops for cases where the data size does not align with the hardware vector width.digraph G { rankdir=TB; bgcolor="#ffffff"; node [style=filled, shape=box, fontname="Sans-Serif", fontsize=10, color="#adb5bd"]; edge [fontname="Sans-Serif", fontsize=9, color="#868e96"]; subgraph cluster_static { label="Static Shape Compilation"; style=rounded; color="#dee2e6"; node [fillcolor="#e599f7"]; S_Input [label="Input Tensor\n(1, 3, 224, 224)"]; S_Plan [label="Compiler calculates\nexact offsets"]; S_Code [label="Generate Code:\nfor (i=0; i<224; i++)"]; S_Input -> S_Plan -> S_Code; } subgraph cluster_dynamic { label="Dynamic Shape Compilation"; style=rounded; color="#dee2e6"; node [fillcolor="#4dabf7"]; D_Input [label="Input Tensor\n(Batch, Seq, 512)"]; D_Plan [label="Insert Shape\nInference Function"]; D_Code [label="Generate Code:\nfor (i=0; i<n; i++)"]; D_Input -> D_Plan -> D_Code; } }Comparison of compilation flows. Static shapes allow direct code generation with fixed bounds, while dynamic shapes require intermediate shape resolution steps.The Flexibility of Dynamic ShapesReal-world applications often defy fixed dimensions. Natural language processing models process sentences of varying lengths. Object detection models output a variable number of bounding boxes depending on the image content. In these scenarios, one or more dimensions are symbolic.A dynamic shape is represented in the IR not as a literal integer like $64$, but as a variable, often denoted as $N$, $M$, or $?$. When a compiler handles dynamic shapes, it cannot pre-compute exact memory offsets. Instead, it must generate code that performs "shape inference" at runtime.Consider a matrix multiplication between tensor $A$ of shape $(M, K)$ and tensor $B$ of shape $(K, N)$. In a dynamic setting, the compiler generates instructions to:Read the runtime values of $M$, $K$, and $N$.Verify that the inner dimensions match (runtime assertion).Calculate the output size $(M, N)$.Allocate memory for the result based on this calculation.This flexibility comes with a cost. The generated binary must include overhead logic to manage these shapes. Additionally, the arithmetic kernels becomes generic. A kernel compiled for "size $N$" is generally less efficient than a kernel compiled for "size 1024" because the compiler cannot hard-code optimization constants or assume data alignment.Symbolic Shape PropagationTo bridge the gap between high-level frameworks and low-level code, ML compilers use symbolic shape propagation. Even if the exact values are unknown, the relationships between shapes are deterministic.If an input tensor has shape $(Batch, 128)$ and passes through a Dense layer with 64 units, the output shape is $(Batch, 64)$. The compiler tracks the symbol $Batch$ through the graph. If a subsequent operation attempts to reshape this tensor into $(Batch, 32, 2)$, the compiler can statically verify that this is valid because $32 \times 2 = 64$. However, if the operation attempts to reshape it to $(Batch, 30, 2)$, the compiler can raise an error at compile time, even without knowing the value of $Batch$.digraph G { rankdir=LR; bgcolor="#ffffff"; node [style=filled, shape=box, fontname="Sans-Serif", fontsize=10, color="#adb5bd"]; edge [fontname="Sans-Serif", fontsize=9, color="#868e96"]; Op1 [label="Input\n(N, 128)", fillcolor="#a5d8ff"]; Op2 [label="MatMul (128x64)\nOutput: (N, 64)", fillcolor="#b197fc"]; Op3 [label="Reshape (32, 2)\nOutput: (N, 32, 2)", fillcolor="#63e6be"]; Op4 [label="ReduceSum (axis=1)\nOutput: (N, 2)", fillcolor="#ffc9c9"]; Op1 -> Op2 [label="Valid"]; Op2 -> Op3 [label="Valid (64 -> 32*2)"]; Op3 -> Op4 [label="Valid"]; }Flow of symbolic dimensions through a computation graph. The compiler tracks the variable 'N' to ensure graph validity without knowing its runtime value.Hybrid Approaches and BucketingBecause dynamic shapes inhibit optimization, engineers often employ a hybrid strategy known as bucketing (or padding) when deploying models. Instead of supporting any arbitrary input size, the system supports a discrete set of "buckets."For example, a serving system might compile specific kernels for sequence lengths of 32, 64, 128, and 256. If a user sends a request with length 50, the runtime pads the data to length 64 and executes the optimized kernel for 64. This approach trades a small amount of compute (processing the padding) for the high efficiency of statically compiled kernels.This trade-off is visible in the performance characteristics. Static compilation yields peak performance but requires re-compilation for every new shape. Dynamic compilation handles everything but with a consistent overhead.{"layout": {"title": {"text": "Performance Profile: Static vs Dynamic Execution", "font": {"size": 14, "family": "Sans-Serif"}}, "xaxis": {"title": {"text": "Input Sequence Length"}}, "yaxis": {"title": {"text": "Inference Latency (ms)"}}, "barmode": "group", "plot_bgcolor": "#f8f9fa", "paper_bgcolor": "#ffffff", "font": {"family": "Sans-Serif", "size": 12}, "margin": {"l": 50, "r": 20, "t": 40, "b": 40}, "showlegend": true}, "data": [{"x": [32, 64, 128, 256], "y": [5, 9, 18, 35], "type": "scatter", "mode": "lines+markers", "name": "Dynamic (Linear)", "line": {"color": "#339af0"}}, {"x": [32, 64, 128, 256], "y": [3, 7, 15, 30], "type": "scatter", "mode": "markers", "name": "Static Optimized", "marker": {"color": "#f03e3e", "size": 10}}]}Latency comparison across sequence lengths. Static optimized points (red) generally offer lower latency than the general-purpose dynamic execution (blue line) at specific design points.Just-in-Time (JIT) SpecializationModern ML compilers often use JIT compilation to handle dynamic shapes while retaining optimization benefits. When the model receives an input with a specific shape (e.g., batch size 1), the JIT compiler generates a specialized kernel for that shape on the fly and caches it. If the next input has the same shape, the cached kernel is reused. If the shape changes significantly, a new kernel is compiled.This method assumes that the distribution of input shapes is not random. In many production environments, input sizes follow a Power Law distribution, meaning a small number of unique shapes account for the majority of traffic. This allows the compiler to specialize for the frequent cases while falling back to a generic dynamic kernel for rare shapes.