Optimizing code manually requires selecting specific parameters for loop tiling, vectorization, and unrolling. For a single matrix multiplication kernel, the volume of possible valid schedules is large. If a loop nest has dimensions and we apply tiling factors , the potential combinations for just these parameters can be calculated as:
When combined with loop reordering, thread binding, and memory scope decisions, the total configuration space grows exponentially. This makes manual tuning impractical for complex neural networks.
This chapter addresses the automation of kernel generation through auto-tuning. We define the optimization search space as the set of all legal transformations for a given computational graph. You will study how compilers define these spaces and use search algorithms, such as simulated annealing and genetic algorithms, to identify high-performance schedules.
Evaluating every candidate schedule on hardware creates a bottleneck. To address this, we introduce statistical cost models. These models use machine learning to predict the throughput of a specific configuration, allowing the search algorithm to estimate performance without hardware execution. The text covers the architecture of modern frameworks like AutoTVM and Ansor, contrasting template-based approaches with generation-based methods. You will learn to configure these systems to automatically find optimal parameters for deep learning operators.
6.1 Defining the Search Space
6.2 Search Algorithms: Random to Genetic
6.3 Machine Learning Cost Models
6.4 Ansor and AutoTVM Architecture
6.5 Hands-on Practical: Auto-Tuning a ResNet Block