Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems, Vol. 30 (Curran Associates, Inc.)DOI: 10.5555/3295222.3295289 - 介绍了Transformer架构,包括缩放点积注意力机制以及由于其排列不变性而对位置编码的需求。
Deep Sets, Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan Salakhutdinov, Alexander J Smola, 2017Advances in Neural Information Processing Systems (NeurIPS), Vol. 30 (Curran Associates, Inc.)DOI: 10.5555/3295222.3295328 - 一篇奠基性论文,从数学上描述了排列不变神经网络,为理解基本自注意力机制为何将输入视为一个集合提供了理论背景。