Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Comprehensive textbook covering fundamental concepts of deep learning, including a detailed explanation of Stochastic Gradient Descent and its variants.
A Stochastic Approximation Method, Herbert Robbins, Sutton Monro, 1951The Annals of Mathematical Statistics, Vol. 22 (Institute of Mathematical Statistics)DOI: 10.1214/aoms/1177729586 - The foundational paper that introduced the stochastic approximation method, which provides the mathematical basis for Stochastic Gradient Descent.
Optimization Methods for Large-Scale Machine Learning, Léon Bottou, Frank E. Curtis, and Jorge Nocedal, 2018SIAM Review, Vol. 60 (Society for Industrial and Applied Mathematics)DOI: 10.1137/16M1080173 - A comprehensive survey on optimization methods for large-scale machine learning, with a significant focus on Stochastic Gradient Descent and its theoretical properties and practical applications.