Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin, 2017Advances in Neural Information Processing Systems (NeurIPS) 30 (Curran Associates, Inc.)DOI: 10.48550/arXiv.1706.03762 - 介绍了Transformer架构,其前馈网络(FFN)模块为现代MoE模型中的各个专家网络提供了结构范本。