Distributed training with TensorFlow, TensorFlow Authors, 2024 (TensorFlow) - Official guide to TensorFlow's tf.distribute.Strategy API, covering various distributed training setups and examples.
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour, Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, Kaiming He, 2017arXiv preprint arXiv:1706.02677DOI: 10.48550/arXiv.1706.02677 - A seminal paper demonstrating synchronous data parallelism with large batch sizes for accelerating deep learning training, illustrating its practical benefits and challenges.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - An authoritative textbook with a dedicated section on parallel and distributed training strategies, including foundational concepts of data and model parallelism.
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism, Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Mia Xu Chen, Dehao Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, Zhifeng Chen, 2019Advances in Neural Information Processing Systems, Vol. 32 (NIPS Foundation)DOI: 10.48550/arXiv.1811.06965 - Introduces pipeline parallelism to enhance the efficiency of model parallelism, addressing the underutilization issue for training extremely large models.