Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A foundational textbook providing an academic introduction to deep learning, covering model parameters, learning algorithms, and neural network architectures. Essential for understanding the theoretical basis of LLM parameters.
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30 (NIPS 2017) (Curran Associates, Inc.) - The seminal paper introducing the Transformer architecture, which forms the basis of modern LLMs. It details the model structure where billions of parameters reside and how they enable language processing.
CS224N: Natural Language Processing with Deep Learning, Diyi Yang, Tatsunori Hashimoto, 2025 (Stanford University) - A comprehensive university course covering deep learning fundamentals applied to NLP, including detailed explanations of model parameters, neural network architectures, and the Transformer model. Provides accessible educational context.