NVIDIA Volta Architecture In-Depth, Stephen Jones and David B. Kirk, 2017 (NVIDIA Corporation) - 详细的技术白皮书,描述了NVIDIA Volta GPU架构,包括用于深度学习加速的Tensor Cores的引入和设计原理。
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning, Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy, 201813th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) (USENIX Association) - 介绍了TVM,一个开源的深度学习编译器栈,它自动优化并为包括CPU、GPU和专用加速器在内的各种硬件后端生成代码,涵盖了软件映射方面。