Dropout: A Simple Way to Prevent Overfitting, Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov, 2014Journal of Machine Learning Research, Vol. 15DOI: 10.5555/2627435.2670313 - The original paper introducing dropout, detailing its mechanism and the necessity of scaling activations at test time to maintain expected output magnitude.
Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - A comprehensive textbook explaining dropout, its theoretical basis, and the rationale behind scaling activations at inference time.
Dropout, Stanford University CS231n Course Staff, 2023 - Practical course notes offering clear explanations and intuitive examples of dropout, including the implementation details for test-time scaling.