Random Search for Hyper-Parameter Optimization, James Bergstra and Yoshua Bengio, 2012Journal of Machine Learning Research, Vol. 13 - Introduces random search as an efficient alternative to grid search for hyperparameter optimization, showing its effectiveness when few hyperparameters are important.
Practical Bayesian Optimization of Machine Learning Algorithms, Jasper Snoek, Hugo Larochelle, Ryan P. Adams, 2012Advances in Neural Information Processing Systems (NIPS) 25, Vol. 25 (NeurIPS Foundation) - Provides a practical guide and demonstrations of Bayesian optimization using Gaussian processes for tuning machine learning algorithms, emphasizing its sample efficiency.
Population Based Training of Neural Networks, Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M. Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, Chrisantha Fernando, Koray Kavukcuoglu, 2017arXiv preprint arXiv:1711.09846DOI: 10.48550/arXiv.1711.09846 - Introduces Population-Based Training, a method that jointly optimizes hyperparameters and neural network weights during training, allowing for adaptive hyperparameter schedules.