Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30 (NeurIPS 2017)DOI: 10.48550/arXiv.1706.03762 - 这篇基础性论文介绍了Transformer架构,该架构完全依赖于注意力机制,并明确定义了通过Value向量的加权和计算上下文向量的方式。