TVM: An Automatic End-to-End Optimizing Compiler for Deep Learning, Tianqi Chen, Lianmin Li, Ekanathan Palamadai, Ziheng Jiang, Haichen Shen, Agustín Cosío, Jonathan Romero, Luis Vega, Jared Roesch, Zhiqiang Xie, Sheng Zha, Yuwei Hu, Haonan Li, Mu Li, Chris Zhu, Jaraslaw Zola, Steven Lyubomirsky, Shizhi Tang, Alec Plunkett, Animesh Agarwal, Amir Gholami, Younes Fraout, Andrew Psota, Alex R. Shinn, Justin Rising, Hongyi Xin, Young-min Kim, Vinod Grover, Bo Dong, Jon F. O'Boyle, Yuandong Tian, Yida Wang, Thierry Moreau, Zhihao Jia, Zachary DeVito, Michael O'Connor, Wei Chen, Deepak Kumar, Masahiro Tanaka, Yi Yang, Junjie Bai, Joshua Auerbach, Michael Garland, Jeff Dean, Jonathan Frankle, Greg Striemer, Chris Lattner, Zachary L. Li, 2019ACM Transactions on Architecture and Code Optimization (TACO), Vol. 16 (ACM)DOI: 10.1145/3342048.3301047 - 解释了代表性的深度学习编译器的架构和优化方法,展示了高级操作如何转换为低级内核。
Demystifying GPU Performance for Deep Learning, Zhihao Jia, Yi Chung, Liqiang Xie, Yuanzhou Yang, Yuandong Tian, and Mu Li, 2018Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '18) (ACM)DOI: 10.1145/3178243.3178253 - 提出了分析和优化 GPU 上深度学习性能的方法,帮助读者处理解释性能分析数据时的实际挑战。