Sparsely Gated Mixture of Experts Layers, Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, Jeff Dean, 2017Advances in Neural Information Processing Systems, Vol. 30 (Neural Information Processing Systems Foundation, Inc. (NeurIPS))DOI: 10.48550/arXiv.1701.06538 - 一项引入稀疏门控专家混合层到深度学习的奠基性工作,讨论了训练MoE模型的挑战,并强调了专家利用率和均衡路由。