NVIDIA Volta Deep Learning Architecture Whitepaper, NVIDIA Corporation, 2017 (NVIDIA) - Introduces the Volta GPU architecture, detailing the design and aim of Tensor Cores for mixed-precision deep learning.
In-Datacenter Performance Analysis of a Tensor Processing Unit, Norman P. Jouppi, Cliff Young, David Patil, Dustin Patterson, David Agrawal, Gyan Mei, Rafael Walker, William R. Dean, Keith Gelatt, Matt Leffler, Aaron Severance, Anand Sitaram, Mark Horowitz, 2017ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture (ACM and IEEE Computer Society)DOI: 10.1145/3079897.3079898 - Presents an analysis of Google's Tensor Processing Unit (TPU) architecture, outlining its design and performance for machine learning workloads.
FP8 Formats for Deep Learning, Paulius Micikevicius, Dusan Stosic, Patrick Judd, John Kamalu, Stuart Oberman, Mohammad Shoeybi, Michael Siu, Neil Burgess, Sangwon Ha, Richard Grisenthwaite, Naveen Mellempudi, Marius Cornea, Alexander Heinecke, Pradeep Dubey, 2022 (NVIDIA, Arm, Intel) - Describes NVIDIA's FP8 formats, their application in deep learning, and hardware support on architectures like Hopper.
NVIDIA TensorRT Documentation, NVIDIA, 2024 (NVIDIA) - Offers information on using TensorRT to optimize and deploy deep learning models, including details on using low-precision hardware capabilities.