Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30, Vol. 30 (Curran Associates, Inc.)DOI: 10.48550/arXiv.1706.03762 - 这篇论文介绍了完全基于注意力机制的Transformer架构,它彻底改变了序列建模和语言模型。
Speech and Language Processing, Daniel Jurafsky and James H. Martin, 2025 (Pearson) - 一本综合性教材,涵盖了传统和神经网络语言模型及其在语音识别中的应用,并包含了Transformer等最新进展。(第三版草稿)