Configuring Training Arguments and Hyperparameters
Was this section helpful?
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A foundational textbook covering deep learning optimization principles, including gradient descent, learning rates, batch size effects, and regularization techniques such as weight decay.
Adam: A Method for Stochastic Optimization, Diederik P. Kingma and Jimmy Ba, 2014International Conference for Learning Representations (ICLR)DOI: 10.48550/arXiv.1412.6980 - Introduces the Adam optimizer, a widely used adaptive learning rate optimization algorithm often paired with learning rate schedules and weight decay in deep learning training.
transformers.TrainingArguments, Hugging Face, 2024 (Hugging Face) - Official documentation for the TrainingArguments class in the Hugging Face transformers library, detailing the configuration options for fine-tuning models.