All Courses

Introduction to ML Compiler Optimization

Chapter 1: The ML Compilation Stack

The Framework-Hardware Gap

Anatomy of an ML Compiler

AOT versus JIT Compilation

Tracing and Graph Capture

Environment Setup Practice

Chapter 2: Intermediate Representations

Role of Intermediate Representation

Dataflow Graphs and Dependencies

Tensor Shapes and Dtypes

Static versus Dynamic Shapes

Inspecting IR Structure

Chapter 3: Graph-Level Optimizations

Operator Fusion Strategies

Constant Folding and Propagation

Dead Code Elimination

Memory Layout Transformation

Common Subexpression Elimination

Implementing a Graph Pass

Chapter 4: Kernel and Loop Optimization

Loop Tiling and Cache Locality

Vectorization for SIMD

Loop Unrolling and Reordering

Parallelization Strategies

Memory Latency Hiding

Matrix Multiplication Practice

Chapter 5: Auto-Tuning and Code Generation

Defining the Search Space

Cost Models in Auto-Tuning

Automated Schedule Search

Code Generation Backends

Running an Auto-Tuning Session

Memory Latency Hiding

Was this section helpful?

© 2025 ApX Machine LearningEngineered with