Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30 (NIPS 2017), Vol. 30 (Curran Associates, Inc.)DOI: 10.55917/cbdd4778 - The paper introducing the Transformer architecture, detailing its components including the standard feed-forward network (FFN) that MoE layers replace.