In-Datacenter Performance Analysis of a Tensor Processing Unit, Jouppi, Norman P., Cliff Young, Nishant Agrawal, Miachael Broomhall, Raymond Chou, Kaijie Dai, Manoj Gelb, Al Gleason, Chris Horton, Veri Jones, Gerard Jourdan, Samuel Knag, Mike Larson, George Ma, Andy Newman, H. Fred Pugsley, Brian R. Riley, David Ross, Alan Smith, Kourosh Taraporewalla, Valentine Turner, Norman Underwood, Chunqiang Xu, Bert Van Zee, and Wolfgang Wang, 2017ACM SIGARCH Computer Architecture News, Vol. 45 (ACM)DOI: 10.1145/3143890.3140600 - Presents the architecture and performance of the first-generation Google TPU, detailing the Matrix Multiply Unit and its systolic array design for machine learning acceleration.
Google's TPU v4: A Domain-Specific Architecture for Modern Machine Learning, Andrew W. Norrie, Derek Bruening, Scott P. Callaway, Patrick W. D. Chi, Nicholas R. Johnson, Alex K. K. Lee, Yanzhi Wang, Jason H. Yoon, Cliff Young, Norman P. Jouppi, 2023Proceedings of the 50th Annual International Symposium on Computer Architecture (ISCA) (IEEE)DOI: 10.1109/ISCA55941.2023.00063 - Describes the architectural improvements of the TPU v4, including its pod architecture and high-bandwidth interconnect, supporting large-scale distributed training for machine learning.
PyTorch/XLA Documentation, PyTorch Contributors, 2024 - Official guide for using PyTorch with Google TPUs via the XLA compiler, covering installation, setup, and programming practices.
Tensor Processing Units (TPUs), Google Cloud Documentation, 2024 (Google Cloud) - Official Google Cloud resource providing an overview of TPU generations, features, and how to access and utilize them for machine learning workloads.