Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Comprehensive theoretical and practical coverage of regularization techniques, including L1, L2, and weight decay, within deep learning.
Decoupled Weight Decay Regularization, Ilya Loshchilov, Frank Hutter, 2019International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1711.05101 - Provides a detailed analysis of weight decay in adaptive optimizers, distinguishing it from standard L2 regularization and explaining its effectiveness.
torch.optim package - Optimizers, PyTorch Developers, 2025 (PyTorch Foundation) - Official documentation for PyTorch optimizers, detailing the weight_decay parameter and its usage for L2 regularization.