Adam: A Method for Stochastic Optimization, Diederik P. Kingma, Jimmy Ba, 2015International Conference on Learning Representations (ICLR 2015)DOI: 10.48550/arXiv.1412.6980 - Original paper introducing the Adam optimizer and its variant Adamax, detailing their derivation and properties.
Incorporating Nesterov Momentum into Adam, Timothy Dozat, 2016Proceedings of the 4th International Conference on Learning Representations - Report introducing the Nadam optimizer, which integrates Nesterov momentum with Adam's adaptive learning rate mechanism.
tf.keras.optimizers.Adamax, TensorFlow Developers, 2024 (TensorFlow) - Official TensorFlow Keras documentation for the Adamax optimizer, providing usage examples and implementation details.
tf.keras.optimizers.Nadam, TensorFlow Developers, 2024 (TensorFlow) - Official TensorFlow Keras documentation for the Nadam optimizer, outlining its parameters and usage in practice.