Standard execution modes in machine learning frameworks, like eager execution or Ahead-of-Time (AOT) compilation, involve specific trade-offs between flexibility and optimization potential. Just-In-Time (JIT) compilation operates differently, deferring the final code generation until runtime. This strategy enables optimizations informed by dynamic runtime context, such as observed tensor shapes (e.g., B×C×H×W) or values.
This chapter covers JIT compilation methods relevant to ML workloads. We will analyze graph acquisition techniques like tracing and scripting, examine requirements for JIT-specific intermediate representations, and study runtime specialization and adaptive compilation strategies. The architecture and function of significant JIT systems, including TensorFlow XLA and PyTorch JIT (TorchScript), serve as practical examples.
7.1 Motivation for JIT Compilation in ML
7.2 Tracing vs. Scripting Approaches
7.3 Intermediate Representation in JIT Systems
7.4 Runtime Specialization and Polymorphism
7.5 Profile-Guided Optimization (PGO) in JITs
7.6 Adaptive and Multi-Tier Compilation
7.7 Case Study: TensorFlow XLA
7.8 Case Study: PyTorch JIT (TorchScript)
7.9 Hands-on Practical: Analyzing JIT Compiled Code
© 2025 ApX Machine Learning