Sparsely-Gated Mixture-of-Experts Layers, Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean, 2017International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1701.06538 - 这篇基础性论文介绍了稀疏门控专家混合模型(MoE)的概念,并提出了用于平衡专家负载的辅助损失函数,直接解决了文中描述的问题。
Router Argumentation for Mixture-of-Experts, Koustuv Sinha, Michael Noukhovitch, Subhabrata Roy, Karthik Srinivasan, William Fedus, Michael Ryoo, and Yoshua Bengio, 2022International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.2202.04944 - 本文提出了改进门控网络路由决策的方法,通过使路由更加稳健,直接促进了更好的负载平衡和专家特化。