Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov, 2014Journal of Machine Learning Research, Vol. 15 - Original paper introducing the Dropout regularization technique for neural networks.
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems, Vol. 30DOI: 10.48550/arXiv.1706.03762 - The foundational paper introducing the Transformer architecture, which details the application of Dropout and Label Smoothing.
When Does Label Smoothing Help?, Rafael Müller, Simon Kornblith, Geoffrey Hinton, 2019NeurIPS 2019DOI: 10.48550/arXiv.1906.02629 - Provides an analysis of Label Smoothing Regularization, exploring its benefits for model calibration and generalization.
Dropout (nn.Dropout), PyTorch Authors, 2024 (PyTorch) - Official documentation for the PyTorch implementation of the Dropout layer, including usage details.