CS230: Deep Learning, Lecture Notes - Optimization Algorithms, Matt Deitke, 2020 (Stanford University) - Official lecture notes from a renowned university course, explaining mini-batch gradient descent as a core optimization method in deep learning.
Optimization Methods for Large-Scale Machine Learning, Léon Bottou, Frank E. Curtis, and Jorge Nocedal, 2018SIAM Review, Vol. 60 (Society for Industrial and Applied Mathematics)DOI: 10.1137/16M1080173 - A survey article providing an in-depth review of various optimization techniques for large-scale machine learning, with significant coverage of stochastic and mini-batch gradient methods.