Automated scheduling systems in deep learning compilers have evolved from template-based approaches to fully automated generation methods. Understanding the architectural differences between AutoTVM and Ansor (also known as the AutoScheduler) is fundamental for engineers designing custom compiler backends or optimizing novel hardware accelerators. While both systems share the goal of maximizing tensor throughput, they diverge fundamentally in how they define and traverse the optimization search space.
AutoTVM represents the first generation of statistical tuning in the TVM stack. Its architecture relies on the premise that while the exact parameters for loop transformations (such as tile sizes or unrolling factors) are difficult to predict, the general structure of an efficient schedule is often known by human experts.
In this architecture, the search space is not automatically derived from the computational graph. Instead, a domain expert must define a schedule template. A template is a Python function written using the TVM scheduling language that includes specific "knobs" or "axes" defining tunable parameters.
For a matrix multiplication operation , a user might write a template that defines the loop order and memory scope but leaves the tiling factors as variables.
The AutoTVM architecture consists of three primary components interacting in a loop:
The limitation of this architecture lies in the template itself. The search space is restricted to the variations allowed by the template author. If the optimal optimization strategy involves a transformation not captured in the template, such as a specific form of tensor layout rewriting or a complex fusion pattern, AutoTVM cannot discover it.
Ansor, introduced to address the scalability limits of AutoTVM, shifts from template-based definition to search space generation. It removes the requirement for human-written templates. Instead, Ansor analyzes the mathematical definition of the computation (typically in Tensor Expression form) and automatically constructs a large, comprehensive search space.
The Ansor architecture operates through a hierarchical process:
The search algorithm in Ansor is typically an evolutionary strategy. It maintains a population of valid schedules and applies mutations, such as changing a tile size or swapping a loop order, to generate new candidates.
The following diagram contrasts the workflow of these two architectures, highlighting the shift from manual template definition to automatic rule application.
Comparison of AutoTVM and Ansor pipelines. AutoTVM relies on explicit user templates to define the bounds of optimization, whereas Ansor derives the search space algorithmically from the computation definition.
Both architectures rely heavily on statistical cost models to prune the search space. Evaluating a schedule on real hardware takes seconds or milliseconds, which is too slow when the search space contains billions of possibilities. A cost model acts as a surrogate, predicting the performance of a schedule in microseconds.
In AutoTVM, the input to the cost model is a feature vector extracted from the loop configuration (e.g., loop extent, memory access count). Ansor improves upon this by using a more structured representation. It extracts features from the low-level intermediate representation (IR) itself, including:
Ansor typically employs a Gradient Boosted Decision Tree (GBDT) such as XGBoost as its default cost model. The system operates in rounds. In each round, the search policy selects a batch of candidates. These candidates are compiled and measured on hardware to obtain ground-truth execution times. This data is used to retrain the cost model, improving its prediction accuracy for the next generation of candidates.
While Ansor is generally superior for standard deep learning workloads due to its broader search coverage, AutoTVM remains relevant for specialized domain-specific logical operations where automatic derivation rules fail.
The generation-based approach of Ansor solves the problem of "local optima" inherent in templates. A template might enforce a specific cache hierarchy mapping that is optimal for ResNet-50 but suboptimal for a slightly different topology like MobileNetV3. By generating sketches dynamically, Ansor adapts the high-level strategy to the specific tensor shapes and hardware constraints present at compile time.
The trade-off is search time. Ansor often requires more time to converge because the space it searches is orders of magnitude larger than a constrained template. However, techniques such as transfer learning, using a cost model trained on similar tasks to warm-start the search, can significantly reduce this overhead.
Understanding these architectures allows developers to debug performance regressions. If an operator is underperforming, one must determine if the limitation stems from the search algorithm failing to find the best schedule (tuning issue) or if the search space itself lacks the necessary transformations (representation issue). In AutoTVM, the fix is rewriting the template. In Ansor, the fix involves adding new sketch generation rules or custom intrinsic mappings.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with