TVM: An Automated End-to-End Optimizing Compiler for Deep Learning, Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy, 201813th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) (USENIX Association) - Describes an end-to-end deep learning compilation and runtime stack, illustrating the interplay between compiler and runtime for diverse hardware.
ONNX Runtime Overview, Yue-Sheng Liu, 2024 (ONNX Runtime) - Official documentation providing an architectural overview of a widely used ML inference runtime, detailing its components and how they interact.
IREE Compiler and Runtime Architecture, IREE Team, 2024 (LF AI & Data Foundation) - Documentation detailing the design principles and components of a modern, multi-target ML compiler and runtime system from a prominent open-source project.
Deep Learning Systems: Algorithms, Compilers, and Processors for Efficient Intelligence, E. Elsen, H. F. Ding, C. J. Ding, S. Gupta, D. K. Kim, V. J. Lee, J. R. Li, D. C. Nellans, S. V. Smith, 2020 (Cambridge University Press) - Provides a comprehensive treatment of deep learning systems, with relevant sections on runtime architectures, memory management, and execution engines.