Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A comprehensive textbook that covers gradient descent and the learning rate as a core hyperparameter in optimization algorithms. Chapter 8, 'Optimization for Training Deep Models,' is especially useful.
CS229 Lecture Notes - Supervised Learning, Part I, Andrew Ng, Tengyu Ma, 2022 (Stanford University) - Part of the official lecture notes from Stanford's Machine Learning course, offering a fundamental explanation of gradient descent and the setting of the learning rate within the context of linear regression.
Adam: A Method for Stochastic Optimization, Diederik P. Kingma and Jimmy Ba, 2014International Conference on Learning Representations (ICLR 2015)DOI: 10.48550/arXiv.1412.6980 - This seminal paper introduces the Adam optimizer, a widely adopted adaptive learning rate algorithm. It offers insights into how learning rates can be dynamically adjusted, a concept mentioned as an advanced topic.