Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - This book provides a comprehensive theoretical and practical foundation for deep learning, including the role of parameters in neural networks and the computational aspects of training and inference, which are fundamental to understanding memory consumption.
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems, Vol. 30 (Curran Associates, Inc.)DOI: 10.5591/978-1-57766-302-3.375 - The seminal paper introducing the Transformer architecture, which forms the basis for most Large Language Models. Understanding the structure of Transformers helps to grasp why these models have such a vast number of parameters and, consequently, high memory demands.
CUDA C++ Programming Guide, NVIDIA Corporation, Latest Edition (NVIDIA Corporation) - Provides detailed information on NVIDIA GPU architecture, including the hierarchy and characteristics of device memory (VRAM) and its role in high-performance computing for applications like deep learning.
CS224N: Natural Language Processing with Deep Learning, Diyi Yang, Tatsunori Hashimoto, 2025 (Stanford University) - Provides lecture materials and assignments that discuss the computational requirements and practical considerations for training and deploying large language models, including discussions on memory.
Scaling Laws for Neural Language Models, Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei, 2020arXiv preprint arXiv:2001.08361DOI: 10.48550/arXiv.2001.08361 - This paper directly explores how model parameters affect performance and, by extension, the computational and memory resources required for effective LLM operation.