torch.optim.lr_scheduler documentation, PyTorch Contributors, 2025 - Official guide for implementing learning rate schedulers in PyTorch, covering various built-in strategies.
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems (NeurIPS)DOI: 10.48550/arXiv.1706.03762 - Describes a learning rate schedule combining linear warmup with inverse square root decay, particularly for Transformer models.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A foundational book offering theoretical background on optimization, including learning rate adjustment techniques.