Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017arXiv preprint arXiv:1706.03762DOI: 10.48550/arXiv.1706.03762 - The foundational paper introducing the Transformer model and detailing the original sinusoidal positional encoding, including its mathematical formulation and its ability to represent relative positions.
CS224N: Natural Language Processing with Deep Learning, Lecture 10: Transformers and Pretraining, Abigail See, Kevin Clark, Yuval Pinter, 2023 (Stanford University) - Offers an accessible explanation of the Transformer architecture, including the motivation and properties of sinusoidal positional encodings, specifically the relative position encoding using trigonometric identities.