Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Chapter 8, 'Optimization for Training Deep Models,' provides a thorough explanation of iterative optimization, gradient descent, and its variants in the context of deep learning.
Numerical Optimization, Jorge Nocedal and Stephen J. Wright, 2006 (Springer)DOI: 10.1007/978-0-387-40065-5 - A classic and comprehensive textbook covering the mathematical foundations and algorithms for numerical optimization, including various forms of gradient descent and related methods.
Adam: A Method for Stochastic Optimization, Diederik P. Kingma and Jimmy Ba, 2015International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1412.6980 - Introduces the Adam optimizer, a widely used adaptive learning rate algorithm that builds upon the principles of stochastic gradient descent.