Masterclass
Training truly large models requires more than just understanding parallelism strategies like data parallelism (DP), tensor parallelism (TP), and pipeline parallelism (PP); it requires tools designed to manage the complexity. This chapter shifts from the theoretical concepts discussed previously to the practical implementation using specialized frameworks.
You will learn how to apply these strategies by configuring and using popular libraries such as DeepSpeed and Megatron-LM. We will cover setting up DeepSpeed's ZeRO memory optimizations (stages 1, 2, and 3) and configuring tensor and pipeline parallelism using Megatron-LM. By the end, you'll be prepared to translate distributed training theory into practice for your own large model projects.
16.1 Overview of Distributed Training Libraries
16.2 Introduction to DeepSpeed
16.3 Using DeepSpeed ZeRO Optimizations
16.4 Introduction to Megatron-LM
16.5 Configuring Tensor and Pipeline Parallelism in Megatron-LM
16.6 Combining Frameworks and Strategies
© 2025 ApX Machine Learning