Adam: A Method for Stochastic Optimization, Diederik P. Kingma and Jimmy Ba, 20153rd International Conference for Learning RepresentationsDOI: 10.48550/arXiv.1412.6980 - Presents the Adam optimizer, which is widely adopted for training and fine-tuning deep learning models, including large language models.
Random Search for Hyper-Parameter Optimization, James Bergstra and Yoshua Bengio, 2012Journal of Machine Learning Research, Vol. 13 - Demonstrates that random search is often more efficient than grid search for hyperparameter optimization.