Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A fundamental textbook providing a thorough mathematical and conceptual background of deep learning, with sections dedicated to optimization, regularization, and hyperparameter selection.
Random Search for Hyper-Parameter Optimization, James Bergstra and Yoshua Bengio, 2012Journal of Machine Learning Research, Vol. 13 (JMLR Editorial Board) - The original paper introducing random search as a more efficient approach to hyperparameter optimization compared to grid search, particularly effective when only a subset of hyperparameters significantly affects performance.
Practical Bayesian Optimization of Machine Learning Algorithms, Jasper Snoek, Hugo Larochelle, Ryan P. Adams, 2012Advances in Neural Information Processing Systems 25, Vol. 25 (NeurIPS Proceedings) - An influential paper demonstrating the efficacy of Bayesian optimization for tuning machine learning algorithms, offering practical advice and empirical results on its application.