Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems (NeurIPS 2017)DOI: 10.48550/arXiv.1706.03762 - 引入Transformer架构的开创性论文,为包括Transformer-XL在内的所有后续Transformer变体奠定了基础。
Natural Language Processing with Transformers, Lewis Tunstall, Leandro von Werra, and Thomas Wolf, 2022 (O'Reilly Media) - 一本关于Transformer模型的综合指南,包括在更广泛的高级架构背景下对Transformer-XL及其机制的解释。