Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - Provides a comprehensive and foundational treatment of gradient descent and its application in deep learning, covering both theoretical aspects and practical considerations.
Pattern Recognition and Machine Learning, Christopher M. Bishop, 2006 (Springer)DOI: 10.1007/978-0-387-45528-0 - A classic machine learning textbook offering a rigorous mathematical introduction to optimization techniques, including the principles of gradient descent for training models.