通信开销分析

这部分内容有帮助吗？

参考文献

torch.distributed (distributed communication) - PyTorch documentation, PyTorch Authors, 2023 (PyTorch Foundation) - 介绍了PyTorch中核心的分布式通信原语及其用法。
NVIDIA Collective Communications Library (NCCL), NVIDIA, 2023 (NVIDIA) - 提供了NVIDIA GPU高度优化的集体通信库概览。
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism, Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro, 2019 arXiv preprint arXiv:1909.08053 DOI: 10.48550/arXiv.1909.08053 - 介绍了大型语言模型的模型并行策略（张量和流水线），并讨论了通信模式。
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He, 2020 SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Association for Computing Machinery (ACM)) DOI: 10.1145/3410464.3410729 - 详细介绍了影响大型模型通信模式和开销的内存优化技术。
The Future of High-Performance Deep Learning, Tal Ben-Nun, Torsten Hoefler, 2019 Journal of Machine Learning Research, Vol. 20 (Journal of Machine Learning Research) - 一篇关于分布式深度学习系统的综合评论，涵盖了通信原语、架构和性能建模。