Data Parallelism with DistributedDataParallel (DDP)
Was this section helpful?
Distributed Data Parallel, PyTorch Documentation, 2024 - Official guide for PyTorch's DistributedDataParallel, covering architecture, usage, and practices.
torch.distributed package - PyTorch Documentation, PyTorch Documentation, 2024 (PyTorch Foundation) - Provides information on the torch.distributed package, including init_process_group and collective operations used by DDP.
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour, Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, Kaiming He, 2017 (Facebook AI Research)DOI: 10.48550/arXiv.1706.02677 - A paper discussing the challenges and solutions for training deep neural networks with large mini-batch sizes, focusing on efficient distributed gradient aggregation.