Sparsely-Gated Mixture-of-Experts Layers, Noam M. Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc V. Le, Geoffrey E. Hinton, Jeffrey Dean, 2017Advances in Neural Information Processing Systems 30 (NeurIPS 2017)DOI: 10.48550/arXiv.1701.06538 - 这篇开创性论文介绍了专家混合架构和辅助负载平衡损失的概念,这对于路由器操作和潜在Z损失的起源至关重要。