Big Bird: Transformers for Longer Sequences, Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed, 2020Advances in Neural Information Processing Systems, Vol. 33 (Curran Associates, Inc.)DOI: 10.48550/arXiv.2007.14062 - 详细介绍了Big Bird模型,这是一种使用局部、全局和随机注意力机制的稀疏高效Transformer变体。