In-Datacenter Performance Analysis of a Tensor Processing Unit, Jouppi, Norman P., Cliff Young, Nishant Agrawal, Miachael Broomhall, Raymond Chou, Kaijie Dai, Manoj Gelb, Al Gleason, Chris Horton, Veri Jones, Gerard Jourdan, Samuel Knag, Mike Larson, George Ma, Andy Newman, H. Fred Pugsley, Brian R. Riley, David Ross, Alan Smith, Kourosh Taraporewalla, Valentine Turner, Norman Underwood, Chunqiang Xu, Bert Van Zee, and Wolfgang Wang, 2017ACM SIGARCH Computer Architecture News, Vol. 45 (ACM)DOI: 10.1145/3143890.3140600 - 介绍了第一代谷歌TPU的架构和性能,详细说明了用于机器学习加速的矩阵乘法单元及其脉动阵列设计。