Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A comprehensive textbook covering the mathematical and conceptual foundations of deep learning, including optimization algorithms, backpropagation, and general training methodologies.
Adam: A Method for Stochastic Optimization, Diederik P. Kingma and Jimmy Ba, 2015International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1412.6980 - Introduces the Adam optimizer, a widely used adaptive learning rate algorithm for training deep neural networks, detailing its mechanism and advantages.