Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - A comprehensive resource on deep learning, with dedicated chapters discussing various optimization algorithms like gradient descent, SGD, momentum, Adam, and RMSprop.
Adam: A Method for Stochastic Optimization, Diederik P. Kingma, Jimmy Ba, 20153rd International Conference for Learning RepresentationsDOI: 10.48550/arXiv.1412.6980 - The original research paper introducing the Adam optimization algorithm, explaining its adaptive learning rate mechanism using first and second moments of gradients.
tf.keras.optimizers module, TensorFlow Developers, 2024 - Official TensorFlow documentation for Keras optimizers, detailing their usage, parameters, and available algorithms such as SGD, Adam, and RMSprop for practical implementation.