Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A foundational textbook that covers the theoretical and practical aspects of deep learning, including the distinction between parameters and hyperparameters, various types of hyperparameters, and their impact on model training and performance.
Adam: A Method for Stochastic Optimization, Diederik P. Kingma and Jimmy Ba, 2015International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1412.6980 - Introduces the Adam optimizer, a widely adopted adaptive learning rate optimization algorithm. This paper is essential for understanding one of the most common and impactful hyperparameters (optimizer choice) and its relation to the learning rate.
Dropout: A Simple Way to Prevent Overfitting, Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov, 2014Journal of Machine Learning Research, Vol. 15 (Journal of Machine Learning Research)DOI: 10.5555/2620392.2620461 - Presents dropout as an effective regularization technique. This paper helps in understanding the dropout rate, a crucial hyperparameter for improving model generalization and preventing overfitting.
Random Search for Hyper-Parameter Optimization, James Bergstra and Yoshua Bengio, 2012Journal of Machine Learning Research, Vol. 13 - This paper's introduction comprehensively outlines the challenges associated with hyperparameter tuning, such as large search spaces and interdependencies, making it relevant for understanding the foundational problems in the field.