The LLVM Target-Independent Code Generator, Chris Lattner, Vikram Adve, 2004Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation (PLDI) (ACM)DOI: 10.1145/996841.996848 - Presents the architecture of LLVM's code generator, detailing the SelectionDAG framework for instruction selection, which is widely used in modern compilers.
MLIR: A Compiler Infrastructure for the End of Moore's Law, Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, Oleksandr Zinenko, 2021Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (IEEE and ACM)DOI: 10.1109/CGO51591.2021.9370308 - Introduces MLIR, a multi-level IR framework for heterogeneous hardware, explaining its dialect system and progressive lowering approach which are crucial for instruction selection in ML compilers.
Tensor Comprehensions: Framework-Agnostic High-Performance ML for GPUs, Nicolas Vasilache, Oleksandr Zinenko, Sam Gross, Zachary DeVito, Vaibhav Nagarajan, Jeremy Appleyard, Eric Johnson, Sirui Xie, Annie Liu, Lisa Lee, Tuanmho Nguyen, Andrew Adams, Jiri Simsa, Jeff Springer, Michael Garland, Trevor Elliott, Stephen Neuendorffer, Vijay Janapa Reddi, David Wong, Emina Torlak, Jonathan Ragan-Kelley, 2018Proceedings of the ACM on Programming Languages, OOPSLA, Vol. 2DOI: 10.1145/3276483.3276495 - Describes a system for generating high-performance GPU code for deep learning, focusing on how it identifies and targets specialized hardware features like NVIDIA Tensor Cores.
A Survey of Deep Learning Compilers, Guanhua Wang, Jiansong Li, Chengyu Dong, Gang Li, Mengze Li, Yu Chen, 2021ACM Computing Surveys, Vol. 54 (ACM)DOI: 10.1145/3448375 - Offers a comprehensive overview of deep learning compiler architectures and challenges, including discussions on code generation, hardware-specific optimizations, and targeting diverse devices.