TVM: An End-to-End Optimizing Compiler for Deep Learning, Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy, 201813th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) (USENIX Association)DOI: 10.5555/3291244.3291295 - Introduces Apache TVM, an open-source deep learning compiler stack that automatically optimizes and generates code for diverse hardware backends, covering many optimizations discussed.
MLIR: A Compiler Infrastructure for the End of Moore's Law, Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Nick Deshler, John Field, Junjie Gui, River Huang, Roman Popov, River Riddle, Stella Laurenzo, Siddhartha Phadke, Nicolas Quesnel, George R. St. Amant, Sigal Sahar, Vincent To, Lengning Liu, and Edwin Choi, 20212021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (IEEE)DOI: 10.1109/CGO51591.2021.9369931 - Describes MLIR, a flexible compiler infrastructure that serves as a foundation for building domain-specific compilers for machine learning, including representations and transformations.
NVIDIA TensorRT Developer Guide, NVIDIA Corporation, 2024 (NVIDIA Corporation) - Official documentation providing practical details on using TensorRT for optimizing and deploying deep learning models, including specifics on graph optimizations and hardware targeting.
XLA: Accelerated Linear Algebra, TensorFlow team, 2024 (Google) - Official documentation describing XLA, Google's optimizing compiler for TensorFlow and JAX, which provides graph optimizations for CPUs, GPUs, and TPUs.