Compiler Optimizations for Machine Learning

Machine learning models require substantial computational resources, necessitating efficient translation from high-level frameworks to machine code. This course examines the architecture and engineering principles behind deep learning compilers such as Apache TVM, XLA, and the MLIR infrastructure. It addresses the transformation pipeline from computational graphs to hardware-specific binaries.

Participants will analyze intermediate representations (IR), graph-level optimizations including operator fusion and layout transformation, and low-level loop optimizations like tiling and vectorization. The curriculum covers the polyhedral model, automatic scheduling strategies, and memory hierarchy management for GPUs and specialized accelerators. Emphasis is placed on the interaction between software logic and hardware constraints to maximize throughput and minimize latency.

Prerequisites Strong Python, ML concepts

Level:

Advanced

Intermediate Representations
Design and manipulate high-level and low-level intermediate representations for tensor computations.
Graph Optimization
Implement graph-level transformations such as operator fusion, constant folding, and dead code elimination.
Loop Transformations
Apply advanced loop optimizations including tiling, unrolling, and reordering to maximize cache locality.
MLIR Infrastructure
Utilize the Multi-Level Intermediate Representation framework to build custom dialects and lowering pipelines.
Auto-tuning
Develop strategies for automated kernel search and cost modeling to find optimal execution schedules.

There are no prerequisite courses for this course.

There are no recommended next courses at the moment.

Share your feedback to help other learners.