Linformer: Self-Attention with Linear Complexity, Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, Hao Ma, 2020Advances in Neural Information Processing Systems (NeurIPS), Vol. 33 (Curran Associates, Inc.)DOI: 10.48550/arXiv.2006.04768 - Introduces the Linformer model, which achieves linear complexity in self-attention by projecting keys and values to a lower dimension.
Nyströmformer: A Nyström-Algorithm-based Efficient Transformer, Yuqi Xiong, Ye Li, Bo Li, Zhanpeng Zeng, Eytan Hu, Lizhen Nie, Chao Zhang, Mohan Teng, Xiao Zhang, Hao Ma, 2021International Conference on Machine Learning (ICML), Vol. 139DOI: 10.48550/arXiv.2102.03906 - Presents an alternative efficient Transformer that uses the Nyström method for low-rank approximation of attention matrices.
A Survey of Efficient Transformers, Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler, 2022ACM Computing Surveys, Vol. 35 (ACM)DOI: 10.48550/arXiv.2009.06732 - Offers an overview and taxonomy of various efficient Transformer architectures, including Linformer and other linear attention models.
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems (NeurIPS), Vol. 30 (Curran Associates, Inc.)DOI: 10.48550/arXiv.1706.03762 - The foundational paper that introduced the Transformer architecture and the self-attention mechanism, highlighting the quadratic complexity bottleneck that Linformer addresses.