Automated scheduling systems in deep learning compilers have evolved from template-based approaches to fully automated generation methods. Understanding the architectural differences between AutoTVM and Ansor (also known as the AutoScheduler) is fundamental for engineers designing custom compiler backends or optimizing novel hardware accelerators. While both systems share the goal of maximizing tensor throughput, they diverge fundamentally in how they define and traverse the optimization search space.The Template-Based Approach: AutoTVMAutoTVM represents the first generation of statistical tuning in the TVM stack. Its architecture relies on the premise that while the exact parameters for loop transformations (such as tile sizes or unrolling factors) are difficult to predict, the general structure of an efficient schedule is often known by human experts.In this architecture, the search space is not automatically derived from the computational graph. Instead, a domain expert must define a schedule template. A template is a Python function written using the TVM scheduling language that includes specific "knobs" or "axes" defining tunable parameters.For a matrix multiplication operation $C = A \times B$, a user might write a template that defines the loop order and memory scope but leaves the tiling factors as variables.$$ T_{split} \in {1, 2, 4, 8, 16, 32, 64} $$The AutoTVM architecture consists of three primary components interacting in a loop:The Tuner: A search algorithm (such as Grid Search, Random Search, or XGBTuner) that proposes specific configurations from the user-defined template.The Runner: A distinct process that compiles the kernel with the proposed configuration, uploads it to the target device (remote or local), runs the benchmark, and returns the execution time.The Cost Model: A statistical model that learns the relationship between the configuration parameters and the measured execution time. As more data points are collected from the Runner, the Cost Model updates itself to predict high-performance configurations more accurately.The limitation of this architecture lies in the template itself. The search space is restricted to the variations allowed by the template author. If the optimal optimization strategy involves a transformation not captured in the template, such as a specific form of tensor layout rewriting or a complex fusion pattern, AutoTVM cannot discover it.The Generation-Based Approach: AnsorAnsor, introduced to address the scalability limits of AutoTVM, shifts from template-based definition to search space generation. It removes the requirement for human-written templates. Instead, Ansor analyzes the mathematical definition of the computation (typically in Tensor Expression form) and automatically constructs a large, comprehensive search space.The Ansor architecture operates through a hierarchical process:Program Sampling (Sketch Generation): Ansor does not start with a fixed loop structure. It analyzes the compute DAG and generates high-level structures called "sketches." A sketch represents a skeleton of a schedule, applying high-level rules such as "always tile the output block" or "fuse element-wise operations into the reduction loop."Performance Tuning (Annotation): Once sketches are generated, the system randomly selects valid instructions to fill the sketches. This corresponds to specific tile sizes, vector lengths, and unrolling factors. This step transforms a high-level sketch into a concrete, executable program.Task Scheduling: For end-to-end neural networks, Ansor allocates time resources to different subgraphs. It prioritizes operators that have the highest impact on end-to-end latency, avoiding time waste on sub-optimal layers that contribute little to the total runtime.The search algorithm in Ansor is typically an evolutionary strategy. It maintains a population of valid schedules and applies mutations, such as changing a tile size or swapping a loop order, to generate new candidates.The following diagram contrasts the workflow of these two architectures, highlighting the shift from manual template definition to automatic rule application.digraph CompilerArchitecture { rankdir=TB; bgcolor="#ffffff"; node [fontname="Sans-Serif", shape=box, style=filled, color="#dee2e6", fillcolor="#f8f9fa", penwidth=1.5]; edge [fontname="Sans-Serif", color="#495057", penwidth=1.2]; subgraph cluster_0 { label = "AutoTVM (Template-Based)"; fontname = "Sans-Serif"; fontsize = 14; color = "#ced4da"; style = dashed; User1 [label="User / Expert", fillcolor="#e7f5ff", color="#74c0fc"]; Template [label="Schedule Template\n(Manual Definition)", fillcolor="#e7f5ff", color="#74c0fc"]; ConfigSpace [label="Configuration Space\n(Fixed Parameters)", fillcolor="#e7f5ff", color="#74c0fc"]; Tuner [label="Search Algorithm\n(XGBoost/Random)", fillcolor="#fff5f5", color="#ff8787"]; User1 -> Template [label="Writes"]; Template -> ConfigSpace [label="Defines"]; ConfigSpace -> Tuner [label="Explores"]; } subgraph cluster_1 { label = "Ansor (Generation-Based)"; fontname = "Sans-Serif"; fontsize = 14; color = "#ced4da"; style = dashed; ComputeDef [label="Compute Definition\n(Tensor Expression)", fillcolor="#f3f0ff", color="#b197fc"]; SketchGen [label="Sketch Generation\n(Derivation Rules)", fillcolor="#f3f0ff", color="#b197fc"]; Evolution [label="Evolutionary Search\n(Mutation/Crossover)", fillcolor="#fff5f5", color="#ff8787"]; ComputeDef -> SketchGen [label="Analyzes"]; SketchGen -> Evolution [label="Generates Space"]; } Hardware [label="Hardware Runner\n(RPC / Profiler)", fillcolor="#d8f5a2", color="#94d82d"]; CostModel [label="Cost Model\n(Throughput Predictor)", fillcolor="#ffe3e3", color="#ffa8a8"]; Tuner -> Hardware [label="Measures"]; Evolution -> Hardware [label="Measures"]; Hardware -> CostModel [label="Training Data"]; CostModel -> Tuner [label="Guides", style=dotted]; CostModel -> Evolution [label="Guides", style=dotted]; }Comparison of AutoTVM and Ansor pipelines. AutoTVM relies on explicit user templates to define the bounds of optimization, whereas Ansor derives the search space algorithmically from the computation definition.Search Policy and Cost ModelingBoth architectures rely heavily on statistical cost models to prune the search space. Evaluating a schedule on real hardware takes seconds or milliseconds, which is too slow when the search space contains billions of possibilities. A cost model acts as a surrogate, predicting the performance of a schedule in microseconds.In AutoTVM, the input to the cost model is a feature vector extracted from the loop configuration (e.g., loop extent, memory access count). Ansor improves upon this by using a more structured representation. It extracts features from the low-level intermediate representation (IR) itself, including:Arithmetic Intensity: The ratio of floating-point operations to memory bytes accessed.Memory Access Patterns: Characteristics of stride and alignment in buffer access.Loop Structure: Information regarding the nesting depth and extent of parallel loops.Ansor typically employs a Gradient Boosted Decision Tree (GBDT) such as XGBoost as its default cost model. The system operates in rounds. In each round, the search policy selects a batch of candidates. These candidates are compiled and measured on hardware to obtain ground-truth execution times. This data is used to retrain the cost model, improving its prediction accuracy for the next generation of candidates.Convergence and Trade-offsWhile Ansor is generally superior for standard deep learning workloads due to its broader search coverage, AutoTVM remains relevant for specialized domain-specific logical operations where automatic derivation rules fail.The generation-based approach of Ansor solves the problem of "local optima" inherent in templates. A template might enforce a specific cache hierarchy mapping that is optimal for ResNet-50 but suboptimal for a slightly different topology like MobileNetV3. By generating sketches dynamically, Ansor adapts the high-level strategy to the specific tensor shapes and hardware constraints present at compile time.The trade-off is search time. Ansor often requires more time to converge because the space it searches is orders of magnitude larger than a constrained template. However, techniques such as transfer learning, using a cost model trained on similar tasks to warm-start the search, can significantly reduce this overhead.Understanding these architectures allows developers to debug performance regressions. If an operator is underperforming, one must determine if the limitation stems from the search algorithm failing to find the best schedule (tuning issue) or if the search space itself lacks the necessary transformations (representation issue). In AutoTVM, the fix is rewriting the template. In Ansor, the fix involves adding new sketch generation rules or custom intrinsic mappings.