TVM: An Automated End-to-End Optimizing Compiler for Deep Learning, Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy, 201813th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) (USENIX Association)DOI: 10.5555/3292415.3292440 - Describes TVM, a deep learning compiler that automates operator optimization, including graph-level transformations and tensor-level code generation for various hardware, addressing the challenges of hardware specialization and optimization scope.
In-Datacenter Performance Analysis of a Tensor Processing Unit, Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, et al., 201744th Annual International Symposium on Computer Architecture (ISCA)DOI: 10.48550/arXiv.1704.04760 - Presents the architecture and performance analysis of Google's Tensor Processing Unit (TPU), highlighting how specialized hardware is designed for ML workloads and the implications for compilers targeting such accelerators.