Delay-Tolerant Distributed Stochastic Gradient Descent, Xiangru Lian, Ce Zhang, Cho-Jui Hsieh, Kai-Wei Chang, Inderjit S. Dhillon, 2015Advances in Neural Information Processing Systems (NeurIPS 28), Vol. 48 (Proceedings of Machine Learning Research) - This paper provides theoretical convergence analysis for asynchronous SGD, specifically addressing the impact of gradient staleness and proposing delay-tolerant algorithms.
Stale Synchronous Parallel Distributed Machine Learning, Qirong Ho, James Cipar, Henggang Cui, Seunghak Lee, Jin Kyu Kim, Phillip B. Gibbons, Garth A Gibson, Greg Ganger, Eric P Xing, 2013Advances in Neural Information Processing Systems 26, Vol. 26 (Neural Information Processing Systems (NIPS)) - This paper introduces the Stale Synchronous Parallel (SSP) paradigm, a hybrid approach that balances the benefits of synchronous and asynchronous updates by allowing a bounded amount of staleness.
Distributed Deep Learning: A Guide to the State-of-the-Art, Tal Ben-Nun, Torsten Hoefler, 2019Communications of the ACM, Vol. 62 (Association for Computing Machinery (ACM))DOI: 10.1145/3291040 - This survey provides a comprehensive overview of distributed deep learning techniques, including detailed discussions on synchronous and asynchronous training strategies, communication patterns, and their trade-offs.