Sparsely-Gated Mixture-of-Experts Layers, Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, Jeff Dean, 2017arXiv preprint arXiv:1701.06538DOI: 10.48550/arXiv.1701.06538 - Introduces the concept of sparsely-gated Mixture-of-Experts, outlining the architecture and the challenges associated with their sparse nature and dynamic routing.
A Domain-Specific Architecture for Deep Neural Networks, Norman P. Jouppi, Cliff Young, David Patil, Dustin Patterson, Gaetano Agostini, Shumeet Baluja, Keren Bergman, Ry Chiang, Sheng Li, Mike Ni, Vijay Nivargi, Paul Norman, Mike Reddi, Kevin Smith, David Sprague, Greg Thorson, Rajat Wadia, Kevin Walker, David Wang, Hongbo Wei, Christof Zabriskie, 2017ACM SIGARCH Computer Architecture News, Vol. 45 (ACM)DOI: 10.1145/3144819.3144824 - Describes the architecture of Google's Tensor Processing Unit (TPU), explaining its systolic array design and high-bandwidth memory, which are beneficial for accelerating machine learning workloads.