Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A foundational textbook in deep learning that provides a comprehensive explanation of optimization algorithms, including AdaGrad, and its role in training neural networks.
Optimization: Stochastic Gradient Descent, Justin Johnson, Andrej Karpathy, and Fei-Fei Li, 2023Stanford University CS231n Course Notes (Stanford University) - Provides an accessible and practical explanation of various optimization algorithms, including AdaGrad, often used as an educational resource for deep learning.