Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Ćukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30, Vol. 30 (Curran Associates, Inc.) - Introduces the Transformer architecture and its original absolute sinusoidal positional encodings, whose limitations are discussed in the section.
Self-Attention with Relative Position Representations, Peter Shaw, Jakob Uszkoreit, Ashish Vaswani, 2018Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Vol. 1 (Association for Computational Linguistics)DOI: 10.18653/v1/N18-1059 - Proposes a method to explicitly encode relative positional information into the self-attention mechanism, addressing a key limitation of absolute positional encodings.
RoFormer: Enhanced Transformer with Rotary Position Embedding, Jianlin Su, Yu Lu, Sheng Wang, Huazhu Fu, Yongfeng Jiang, Bo Wen, Jianjun Li, 2021arXiv preprint arXiv:2104.09864 (arXiv)DOI: 10.48550/arXiv.2104.09864 - Introduces Rotary Position Embedding (RoPE), a method that addresses the challenges of sequence length extrapolation and relative position representation in Transformers.