Random Search for Hyper-Parameter Optimization, James Bergstra and Yoshua Bengio, 2012Journal of Machine Learning Research, Vol. 13 (Microtome Publishing)DOI: 10.1622/jmlr.v13.12-140 - The original paper introducing random search as a more efficient alternative to grid search for hyperparameter optimization.
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium, Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter, 2017Advances in Neural Information Processing Systems (NeurIPS) 30 (Curran Associates, Inc.) - Introduces the Two Time-Scale Update Rule (TTUR) for stable GAN training and the Fréchet Inception Distance (FID) for evaluating GAN sample quality.
Population Based Training of Neural Networks, Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M. Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, Chrisantha Fernando, Koray Kavukcuoglu, 2017arXiv preprint arXiv:1711.09846DOI: 10.48550/arXiv.1711.09846 - Proposes Population-Based Training (PBT), a method that combines hyperparameter optimization and model training by concurrently training a population of models.
Improved Training of Wasserstein GANs, Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville, 2017Advances in Neural Information Processing Systems, Vol. 30 (NeurIPS)DOI: 10.5555/3295222.3295327 - Introduces the WGAN-GP formulation, which uses a gradient penalty to enforce the Lipschitz constraint, significantly improving GAN training stability.