Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Ćukasz Kaiser, and Illia Polosukhin, 2017Advances in Neural Information Processing Systems (NeurIPS) 30 (Curran Associates, Inc.)DOI: 10.48550/arXiv.1706.03762 - Introduces the Transformer architecture, whose Feed-Forward Network (FFN) block provides the structural template for individual expert networks in modern MoE models.