Pattern Recognition and Machine Learning, Christopher M. Bishop, 2006 (Springer) - This foundational textbook offers a clear introduction to gradient descent as an optimization technique in machine learning, particularly for parameter updates.
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Trevor Hastie, Robert Tibshirani, and Jerome Friedman, 2009 (Springer) - Provides a comprehensive treatment of gradient boosting, detailing its interpretation as functional gradient descent and the underlying loss minimization.
Convex Optimization, Stephen Boyd and Lieven Vandenberghe, 2004 (Cambridge University Press) - Offers a rigorous mathematical foundation for gradient descent methods and their properties in optimization problems.