Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30 (Curran Associates, Inc.)DOI: 10.48550/arXiv.1706.03762 - 这篇基础性论文介绍了Transformer架构,详细阐述了其中位置感知前馈网络的结构和作用,包括常见的d_ff = 4 * d_model参数设置。