Pattern Recognition and Machine Learning, Christopher M. Bishop, 2006 (Springer)DOI: 10.1007/b139933 - Offers a comprehensive and mathematically rigorous explanation of optimization methods, including gradient descent, in the context of machine learning.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Provides a detailed discussion of optimization algorithms for training deep models, including the mechanics and characteristics of Batch Gradient Descent.
CS229 Lecture Notes: Linear Regression, Andrew Ng, Tengyu Ma, 2023 (Stanford University) - A widely recognized academic resource that provides a clear and foundational introduction to gradient descent, demonstrating its application in linear regression.