Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A comprehensive textbook covering fundamental concepts like neural networks, backpropagation, loss functions, and optimizers, essential for understanding the mechanics of fine-tuning.
Adam: A Method for Stochastic Optimization, Diederik P. Kingma and Jimmy Ba, 2015International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1412.6980 - Introduces the Adaptive Moment Estimation (Adam) optimizer, a foundational algorithm for the AdamW optimizer discussed in the section.
Decoupled Weight Decay Regularization, Ilya Loshchilov and Frank Hutter, 2019International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1711.05101 - Presents AdamW, an enhanced version of Adam that decouples weight decay from the adaptive learning rate, helpful for effectively training large models.
CS224n: Natural Language Processing with Deep Learning - Course Materials, Stanford University, 2023 (Stanford University) - Provides excellent course materials covering the fundamental principles of deep learning for NLP, including backpropagation, optimizers, and the training loop relevant to fine-tuning LLMs.