Delay-Tolerant Distributed Stochastic Gradient Descent, Xiangru Lian, Ce Zhang, Cho-Jui Hsieh, Kai-Wei Chang, Inderjit S. Dhillon, 2015Advances in Neural Information Processing Systems (NeurIPS 28), Vol. 48 (Proceedings of Machine Learning Research) - 这篇论文为异步随机梯度下降(SGD)提供了理论收敛分析,特别讨论了梯度陈旧性的影响并提出了容忍延迟的算法。
Stale Synchronous Parallel Distributed Machine Learning, Qirong Ho, James Cipar, Henggang Cui, Seunghak Lee, Jin Kyu Kim, Phillip B. Gibbons, Garth A Gibson, Greg Ganger, Eric P Xing, 2013Advances in Neural Information Processing Systems 26, Vol. 26 (Neural Information Processing Systems (NIPS)) - 这篇论文介绍了陈旧同步并行(SSP)范式,这是一种混合方法,通过允许有限的梯度陈旧性来平衡同步和异步更新的优点。