QSGD: Communication-Efficient SGD via Gradient Quantization, Dan Alistarh, Demjan Grubic, Jerry Li, Ryota Tomioka, Milan Vojnovic, 2017Advances in Neural Information Processing Systems, Vol. 30 (NeurIPS) - This paper introduces Quantized SGD (QSGD), a foundational stochastic quantization method for communication-efficient distributed training, directly relevant to gradient quantization.
Sparsified SGD with Memory, Stich, Sebastian U., Cordonnier, Jean-Baptiste, Jaggi, Martin, 2018Advances in Neural Information Processing Systems, Vol. 31 (NeurIPS) - This research analyzes sparsified SGD, including Top-k selection, and discusses memory mechanisms to reduce information loss, directly addressing gradient sparsification.
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training, Lin, Yujun and Han, Song and Mao, Huizi and Wang, Yu and Diao, Wei and Xun, Shang and Tang, Wei and Yang, Myron and Fan, Xiangyu and Gan, Rong, 2018International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1712.01887 - This paper presents an influential method that combines sparsification, quantization, and error compensation for significant communication reduction in distributed training.