Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30DOI: 10.48550/arXiv.1706.03762 - Introduces the Transformer architecture and the fixed sinusoidal positional encoding mechanism.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, 2018Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)DOI: 10.48550/arXiv.1810.04805 - Introduces BERT, a Transformer-based language model that utilizes learned positional embeddings.
Speech and Language Processing (3rd ed. draft), Daniel Jurafsky, James H. Martin, 2025 - A textbook offering explanations of Transformer architectures, attention mechanisms, and positional encoding methods.
torch.nn.Embedding, PyTorch Authors, 2024 - Official documentation for PyTorch's nn.Embedding module, used for implementing learned positional embeddings.