TVM: An Automatic End-to-End Optimizing Compiler for Deep Learning, Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy, 2018Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI '18) (USENIX Association)DOI: 10.1145/3294344.3259969 - Details graph-level optimizations, including operator fusion techniques and cost modeling, implemented in the TVM deep learning compiler framework.
AOT compilation with XLA, TensorFlow Authors, 2024 - Official documentation explaining Ahead-Of-Time (AOT) compilation with XLA, highlighting how it performs graph optimizations like aggressive operator fusion to enhance performance.
Graph-Level Optimization for Deep Learning Compilers: A Survey, Tan Wang, Peng Li, Jinchen Wu, Junhao Wen, Zhaoyang Zhang, Yu Zhang, 2019ACM Computing Surveys, Vol. 52 (Association for Computing Machinery)DOI: 10.1145/3342371 - Comprehensive survey of graph-level optimizations in deep learning compilers, dedicating sections to various operator fusion strategies and their underlying principles.
TensorFlow: A System for Large-Scale Machine Learning, Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng, 2016Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI '16) (USENIX Association)DOI: 10.1145/2996901.2996906 - Foundational paper introducing the TensorFlow system architecture, providing essential context for understanding subsequent graph optimization components like XLA and their fusion capabilities.