Adam: A Method for Stochastic Optimization, Diederik P. Kingma, Jimmy Ba, 2015International Conference on Learning RepresentationsDOI: 10.48550/arXiv.1412.6980 - This foundational paper introduces the Adam optimizer and provides context for RMSprop as a precursor, discussing its motivation and comparing it to other adaptive methods.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A comprehensive textbook explanation of adaptive learning rate methods, including RMSprop, its mechanics, and its place among other optimizers.