Rethinking Attention with Performers, Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, Adrian Weller, 2021International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.2009.14794 - Introduces the Performer model, which leverages Positive Orthogonal Random Features (FAVOR+) for kernel-based approximation of attention, achieving linear complexity.
Linformer: Self-Attention with Linear Complexity, Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, Hao Ma, 2020International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.2006.04768 - Proposes Linformer, a Transformer variant that achieves linear complexity through low-rank projection of key and value matrices.
Long-Range Arena: A Benchmark for Efficient Transformers, Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler, 2020International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.2011.04006 - This paper introduces a standardized benchmark for evaluating the efficiency and performance of various Transformer architectures, including linear attention variants, on long-range dependency tasks.