Sparsely-Gated Mixture-of-Experts Layers, Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, Jeff Dean, 2017International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1701.06538 - Introduces the modern sparse Mixture-of-Experts architecture with learned gating and discusses load balancing challenges.