Compiler Optimizations for Machine Learning
Chapter 1: Anatomy of Deep Learning Compilers
Compilation Pipeline Overview
Computational Graphs and DAGs
Static Single Assignment in ML
High-Level vs Low-Level IR
Hands-on Practical: Inspecting Relay IR
Chapter 2: Graph-Level Transformations
Operator Fusion Strategies
Dead Code and Common Subexpression Elimination
Hands-on Practical: Implementing a Fusion Pass
Chapter 3: Kernel Optimization and Loop Schedules
The Loop Optimization Space
Loop Reordering and Unrolling
Hands-on Practical: Optimizing Matrix Multiplication
Chapter 4: The MLIR Infrastructure
MLIR Architecture and Dialects
Affine Dialect and Analysis
Pattern Rewriting and Lowering
Hands-on Practical: Creating a Custom Dialect
Chapter 5: Hardware-Aware Code Generation
GPU Memory Hierarchy Mapping
Thread Binding and Warp Divergence
Shared Memory Banking and Conflicts
Hands-on Practical: Writing Triton Kernels
Chapter 6: Auto-Tuning and Search Spaces
Defining the Search Space
Search Algorithms: Random to Genetic
Machine Learning Cost Models
Ansor and AutoTVM Architecture
Hands-on Practical: Auto-Tuning a ResNet Block