Decoupled Weight Decay Regularization, Ilya Loshchilov, Frank Hutter, 2019International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1711.05101 - Introduces decoupled weight decay for adaptive optimizers, addressing limitations of L2 regularization in Adam.
Rethinking the Inception Architecture for Computer Vision, Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna, 2015Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)DOI: 10.48550/arXiv.1512.00567 - Introduces label smoothing regularization as a method to improve generalization in classification models.
torch.optim.AdamW, PyTorch Authors, 2023 (PyTorch) - Official documentation for PyTorch's AdamW optimizer, detailing its usage and parameters.
torch.nn.CrossEntropyLoss, PyTorch Authors, 2023 (PyTorch) - Official documentation for PyTorch's CrossEntropyLoss module, including support for label smoothing.