Weights & Biases Documentation: Tracking Large Language Models, Weights & Biases, Accessed 2025 (Weights & Biases) - Official documentation providing practical guidance and best practices for experiment tracking specifically tailored for large language models, covering metrics, artifacts, and system health.
DeepSpeed: System Optimizations for Large-Scale Model Training, Samyam Rajbhandari, Cong Guo, Jeff Rasley, Shaden Smith, Yuxiong He, 2020arXiv preprint arXiv:2008.01666DOI: 10.48550/arXiv.2008.01666 - Introduces DeepSpeed, a distributed training framework whose optimizations (like ZeRO) require detailed tracking of distributed configurations and resource utilization as discussed in the section.
Language Models are Few-Shot Learners, Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Anna Stooke, Erin Cooke, Scott Clark, Allie Schmidt, Aditya Ramesh, Andy Jones, Chris McMahon, Ambrose Slone, Chris Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei, 2020Advances in Neural Information Processing Systems, Vol. 33DOI: 10.55989/nips.2020.01633 - Details the architecture and training process of GPT-3, exemplifying the vast scale of LLM training and the importance of tracking extensive hyperparameters and configurations for such models.