GPipe: Efficient Training of Giant Models using Pipeline Parallelism, Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V Le, Yonghui Wu, Zhifeng Chen, 2019Advances in Neural Information Processing Systems, Vol. 32 (NeurIPS)DOI: 10.5555/3454287.3455115 - Introduces pipeline parallelism with micro-batching to improve device utilization, addressing the 'pipeline bubble' for training large models.