Optimize machine learning model performance through sophisticated compiler and runtime techniques. This course covers advanced intermediate representations, graph and tensor-level optimizations, code generation for heterogeneous hardware (CPUs, GPUs, accelerators), specialized runtime systems, JIT compilation strategies, and low-precision computation methods tailored for demanding AI applications.
Prerequisites: Extensive experience with compiler design principles (IRs, optimization passes, code generation), computer architecture (CPU/GPU internals, memory hierarchy), and machine learning framework internals (e.g., TensorFlow, PyTorch). Proficiency in C++ and Python.
Level: Expert
Advanced IR Design
Analyze and utilize sophisticated Intermediate Representations like MLIR for expressing and optimizing complex ML computations.
Graph-Level Optimization
Implement advanced graph optimization passes including operator fusion, layout transformations, and algebraic simplifications.
Tensor-Level Optimization
Apply polyhedral modeling, advanced loop transformations, and auto-vectorization techniques for tensor operations.
Heterogeneous Code Generation
Generate highly optimized code for diverse hardware targets including multi-core CPUs, GPUs (CUDA/ROCm), and specialized AI accelerators.
ML Runtime Systems
Design and analyze runtime components for dynamic shape handling, efficient memory management, and heterogeneous task scheduling.
JIT Compilation for ML
Implement and analyze JIT compilation techniques for ML models, focusing on specialization and adaptive compilation.
Low-Precision Optimization
Apply compiler and runtime techniques to support and optimize models using quantization and low-precision arithmetic.
Performance Analysis
Utilize advanced profiling tools to diagnose performance bottlenecks in compiled ML code execution.
© 2025 ApX Machine Learning