A Neural Probabilistic Language Model, Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin, 2003Journal of Machine Learning Research, Vol. 3 - 一篇基础论文,介绍了最早的学习词嵌入和基于神经网络的语言模型之一。
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30 (Curran Associates, Inc.) - 介绍了Transformer架构,该架构彻底改变了序列建模,并成为现代大型语言模型的基础。