Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Comprehensive textbook covering fundamental concepts of deep learning, including various optimization algorithms discussed in the section.
On the Importance of Initialization and Momentum in Deep Learning, Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton, 2013Proceedings of the 30th International Conference on Machine Learning (ICML), Vol. 28 (PMLR) - Influential paper exploring the role of momentum in accelerating training and improving convergence of deep neural networks.
Adam: A Method for Stochastic Optimization, Diederik P. Kingma and Jimmy Ba, 2015International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1412.6980 - Seminal paper introducing the Adam optimizer, which combines momentum and adaptive learning rates and is a popular default choice for many deep learning tasks.