Decoupled Weight Decay Regularization, Ilya Loshchilov, Frank Hutter, 2019International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1711.05101 - Introduces AdamW, explaining the separation of weight decay from L2 regularization in adaptive optimizers.