Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems, Vol. 30 (NeurIPS)DOI: 10.5555/3295222.3295349 - 介绍了 Transformer 架构和多头注意力机制,这是现代大型语言模型的基础。