Rethinking Attention with Performers, Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, Adrian Weller, 2021International Conference on Learning Representations (ICLR 2021)DOI: 10.48550/arXiv.2009.14794 - 介绍Performer架构和FAVOR+机制的原始研究论文,实现了线性时间复杂度的注意力近似。
Efficient Transformers: A Survey, Yi Tay, Mostafa Dehghani, Dara Bahri, and Donald Metzler, 2022ACM Computing Surveys, Vol. 55 (Association for Computing Machinery (ACM))DOI: 10.1145/3530811 - 全面概述了提高Transformer模型效率的各种技术,包括Performer等不同的线性注意力机制。