GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism, Yanping Huang, Youlong Cheng, Dehao Chen, Hyoukjin Kwon, Ankur Bapna, Zhifeng Chen, Mia Xu Chen, Jonathan Dean, Marc Edwards, Yuan Gong, Geoffrey Hinton, Lars Jylkka, Sebastian Kastner, Ravi Kumar, Da Li, Quoc V. Le, Jiquan Ngiam, Jeff Norris, Adam Paszke, Alexandre Passos, James Perkins, Sascha Pokrovsky, Jamie Smith, Noam Shazeer, Aurora S. Smith, Barret Zoph, Yonghui Wu, 2019Advances in Neural Information Processing Systems, Vol. 32 (NeurIPS Proceedings)DOI: 10.5591/978-1-7138-0401-4.neurips-2019-397 - 介绍了GPipe,这是一种层间模型并行方法,它利用微批处理减少通信开销并提高设备利用率。