GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism, Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V Le, Yonghui Wu, Zhifeng Chen, 2019Advances in Neural Information Processing Systems (NeurIPS), Vol. 32DOI: 10.5555/3454287.3455171 - Introduces pipeline parallelism with micro-batching for training large neural networks, a foundational work on the topic.