PyTorch Distributed Overview, PyTorch Core Team, 2024 - Official documentation offering practical guidance on implementing distributed data parallelism with torch.distributed.DistributedDataParallel.
Distributed training with TensorFlow, TensorFlow Authors, 2024 - Official documentation explaining how to use tf.distribute.Strategy for various distributed training setups, including data parallelism.
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour, Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, Kaiming He, 2017arXiv preprint arXiv:1706.02677DOI: 10.48550/arXiv.1706.02677 - Research paper introducing a linear scaling rule for learning rates when using large mini-batches in data-parallel training to maintain optimization stability.