Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems (NeurIPS 2017)DOI: 10.48550/arXiv.1706.03762 - The seminal paper that introduced the Transformer architecture, providing the basis for all subsequent Transformer variants including Transformer-XL.
Natural Language Processing with Transformers, Lewis Tunstall, Leandro von Werra, and Thomas Wolf, 2022 (O'Reilly Media) - A comprehensive guide to Transformer models, including an explanation of Transformer-XL and its mechanisms within a broader context of advanced architectures.