SPMD on Multiple Devices with pmap, Vladimir Mikulik, Roman Ring, 2024 - Explains pmap for Single-Program Multiple-Data (SPMD) programming across multiple devices, including its in_axes argument and handling of data and parameter distribution.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Offers a comprehensive explanation of data parallelism and other distributed training strategies for deep learning models, providing foundational context.
Distributed Machine Learning Patterns, Yuan Tang, 2024 (Manning Publications) - Covers various patterns for distributed machine learning, including data parallelism, model parallelism, and communication strategies, offering practical insights into scaling ML workloads.