DistributedDataParallel, PyTorch Documentation, 2023 - Official documentation for PyTorch's primary API for data parallelism, detailing its usage and internal mechanisms for synchronized training.
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour, Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, Kaiming He, 2017arXiv preprint arXiv:1706.02677DOI: 10.48550/arXiv.1706.02677 - A foundational paper discussing the techniques for training with large batch sizes, including the linear scaling rule for learning rates, which is important in data parallel settings.
NVIDIA Collective Communications Library (NCCL), NVIDIA Corporation, 2023 - Official resource for the highly optimized library that provides collective communication primitives like AllReduce, fundamental for efficient data parallelism on NVIDIA GPUs.