CS224N: Natural Language Processing with Deep Learning, Christopher Manning and Richard Socher and Abolfazl Asudeh and John Hewitt and Chenhao Tan, 2023 (Stanford University) - An excellent university course covering deep learning methods applied to natural language processing, essential for understanding modern language models.
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems, Vol. 30DOI: 10.48550/arXiv.1706.03762 - The seminal paper introducing the Transformer architecture, which forms the basis for most large language models and their advancements.