Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A foundational textbook covering the mathematical and algorithmic principles of deep learning, including optimization, loss functions, and neural network training.
A Survey of Large Language Models, Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, Ji-Rong Wen, 2023arXiv preprint arXiv:2303.18223DOI: 10.48550/arXiv.2303.18223 - Provides a comprehensive overview of large language models, including discussions on pre-training objectives, architectures, and general training methodologies.
Cloud TPU for Large Language Model Training, Google Cloud Documentation, 2024 (Google Cloud) - Official documentation explaining how Google Cloud TPUs are utilized for large-scale language model training, highlighting hardware and scaling considerations.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu, 2020J. Mach. Learn. Res., Vol. 21 (JMLR) - An influential paper introducing the T5 model and discussing transfer learning, pre-training objectives, and fine-tuning strategies for various NLP tasks.