Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin, 2017Advances in Neural Information Processing Systems, Vol. 30 (Neural Information Processing Systems)DOI: 10.5555/3295222.3295349 - 介绍Transformer架构和自注意力机制的基础论文,强调其相对于序列长度的二次复杂度。
Big Bird: Transformers for Longer Sequences, Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed, 2020Advances in Neural Information Processing Systems, Vol. 33 (Curran Associates, Inc.)DOI: 10.5555/3455793.3455928 - 提出BigBird,一种结合全局、局部和随机注意力的稀疏注意力机制,实现线性复杂度,同时在长序列任务上保持良好性能。