Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Provides a comprehensive introduction to optimization algorithms, including gradient descent, its mathematical principles, and practical considerations for machine learning.
An Introduction to Statistical Learning with Applications in R, Gareth James, Daniela Witten, Trevor Hastie, Rob Tibshirani, 2013 (Springer) - Covers linear regression models and the fundamental optimization techniques used to fit them, providing a statistical learning perspective on the process.
CS229 Lecture Notes: Supervised Learning, Linear Regression, Andrew Ng, 2018Stanford University CS229 Lecture Notes - Introduces linear regression and thoroughly explains the gradient descent algorithm, including the cost function, partial derivatives, and parameter updates, as part of a foundational machine learning course.
Numerical Optimization, Jorge Nocedal and Stephen J. Wright, 2006 (Springer)DOI: 10.1007/978-0-387-40065-5 - Offers a rigorous mathematical treatment of optimization algorithms, including the theoretical underpinnings and practical considerations of gradient descent methods.