Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - This foundational textbook for deep learning offers comprehensive coverage of optimization algorithms, including the critical role of the learning rate in gradient descent.
CS231n: Convolutional Neural Networks for Visual Recognition - Optimization, Stanford University, 2023 (Stanford University) - Provides excellent, accessible explanations of optimization algorithms, including detailed discussions and visualizations of the learning rate's impact and strategies for its adjustment in neural network training.
Adam: A Method for Stochastic Optimization, Diederik P. Kingma and Jimmy Ba, 20153rd International Conference for Learning RepresentationsDOI: 10.48550/arXiv.1412.6980 - Introduces Adam, a widely used adaptive learning rate optimizer. This paper highlights the difficulties of setting a fixed learning rate and motivates adaptive approaches, which the section briefly mentions.