Adam: A Method for Stochastic Optimization, Diederik P. Kingma, Jimmy Ba, 2015International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1412.6980 - The original paper that introduced the Adam optimizer, providing the foundation upon which AMSGrad was developed as an improvement.
Optimization for Deep Learning, Stanford University CS231n Course Staff, 2023 (Stanford University) - Comprehensive lecture notes from Stanford's CS231n course, covering various optimization algorithms in deep learning, including Adam and a discussion of AMSGrad.
torch.optim.Adam, PyTorch Authors, 2024 (PyTorch) - Official PyTorch documentation for the Adam optimizer, which includes practical details on enabling and using the AMSGrad variant.