Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30 (NeurIPS 2017)DOI: 10.48550/arXiv.1706.03762 - 介绍了Transformer模型和多头注意力机制,详细阐述了Q、K、V向量的线性投影。