Rethinking Attention with Performers, Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, Adrian Weller, 2021International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.2009.14794 - 介绍了Performer模型,利用正交随机特征(FAVOR+)对注意力进行核近似,实现了线性复杂度。
Long-Range Arena: A Benchmark for Efficient Transformers, Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler, 2020International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.2011.04006 - 本文介绍了一个标准化基准,用于评估各种Transformer架构(包括线性注意力变体)在长距离依赖任务上的效率和性能。