Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - A standard textbook covering fundamental concepts and algorithms in deep learning, including detailed discussions of classical optimization methods like SGD, momentum, and adaptive learning rate techniques.
Adam: A Method for Stochastic Optimization, Diederik P. Kingma, Jimmy Ba, 20143rd International Conference for Learning Representations (ICLR 2015)DOI: 10.48550/arXiv.1412.6980 - The original research paper introducing the Adam optimization algorithm, a widely adopted adaptive learning rate method in machine learning and quantum machine learning.
Barren Plateaus in Quantum Neural Network Training Landscapes, Jarrod R. McClean, Sergio Boixo, Vadim N. Smelyanskiy, Ryan Babbush, Hartmut Neven, 2018Nature Communications, Vol. 9 (Springer Nature)DOI: 10.1038/s41467-018-07090-4 - A significant paper that describes the 'barren plateau' phenomenon, a major challenge for training deep parameterized quantum circuits due to vanishing gradients.