All Courses

Compiler Optimizations for Machine Learning

Chapter 1: Anatomy of Deep Learning Compilers

Compilation Pipeline Overview

Computational Graphs and DAGs

Static Single Assignment in ML

High-Level vs Low-Level IR

Hands-on Practical: Inspecting Relay IR

Chapter 2: Graph-Level Transformations

Operator Fusion Strategies

Algebraic Simplification

Layout Transformation

Dead Code and Common Subexpression Elimination

Hands-on Practical: Implementing a Fusion Pass

Chapter 3: Kernel Optimization and Loop Schedules

The Loop Optimization Space

Loop Tiling and Blocking

Vectorization and SIMD

Loop Reordering and Unrolling

Hands-on Practical: Optimizing Matrix Multiplication

Chapter 4: The MLIR Infrastructure

MLIR Architecture and Dialects

The Linalg Dialect

Affine Dialect and Analysis

Pattern Rewriting and Lowering

Hands-on Practical: Creating a Custom Dialect

Chapter 5: Hardware-Aware Code Generation

GPU Memory Hierarchy Mapping

Thread Binding and Warp Divergence

Tensor Core Intrinsics

Shared Memory Banking and Conflicts

Hands-on Practical: Writing Triton Kernels

Chapter 6: Auto-Tuning and Search Spaces

Defining the Search Space

Search Algorithms: Random to Genetic

Machine Learning Cost Models

Ansor and AutoTVM Architecture

Hands-on Practical: Auto-Tuning a ResNet Block

GPU Memory Hierarchy Mapping

Was this section helpful?

© 2025 ApX Machine LearningEngineered with