TVM: An Automatic End-to-End Optimizing Compiler for Deep Learning, Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy, 2018Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI '18) (USENIX Association)DOI: 10.1145/3294344.3259969 - 介绍TVM深度学习编译器框架中图级别优化,包括操作符融合技术和成本模型。
AOT compilation with XLA, TensorFlow Authors, 2024 - 官方文档,说明XLA的提前编译(AOT),强调其如何通过激进的操作符融合等图优化技术提高性能。
TensorFlow: A System for Large-Scale Machine Learning, Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng, 2016Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI '16) (USENIX Association)DOI: 10.1145/2996901.2996906 - 介绍TensorFlow系统架构的基础性论文,为了解XLA等后续图优化组件及其融合能力提供了必要背景。