On the difficulty of training recurrent neural networks, Razvan Pascanu, Tomas Mikolov, Yoshua Bengio, 2013Proceedings of the 30th International Conference on Machine Learning, Vol. 28 (PMLR) - 阐述深度网络中梯度爆炸问题并提出梯度裁剪作为稳定训练方法的开创性论文。
Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - 全面介绍深度学习理论基础的教科书,详细解释了梯度问题和梯度裁剪等正则化技术。