Numerical Optimization, Jorge Nocedal and Stephen J. Wright, 2006 (Springer) - A classic textbook providing a thorough treatment of optimization algorithms and their convergence analysis, covering gradient methods, Newton's method, and quasi-Newton methods. It details different convergence rates and their underlying conditions.
Deep Learning, Ian Goodfellow and Yoshua Bengio and Aaron Courville, 2016 (MIT Press) - Chapter 8 of this foundational book focuses on optimization for deep models, discussing the challenges of non-convexity, saddle points, and the practical application and analysis of algorithms like SGD and Momentum in machine learning.
Stochastic Gradient Descent, Léon Bottou, Frank E. Curtis, Jorge Nocedal, 2018SIAM Review, Vol. 60 (Society for Industrial and Applied Mathematics)DOI: 10.1137/16M1080173 - A comprehensive survey article dedicated to Stochastic Gradient Descent, providing a detailed theoretical foundation for its convergence properties, different variants, and its significant role in large-scale machine learning.