On the difficulty of training recurrent neural networks, Razvan Pascanu, Tomas Mikolov, Yoshua Bengio, 2013Proceedings of the 30th International Conference on Machine Learning, Vol. 28 (PMLR) - Foundational paper identifying exploding gradients in deep networks and proposing gradient clipping as a method to stabilize training.
Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - Comprehensive textbook covering theoretical foundations of deep learning, including detailed explanations of gradient issues and regularization techniques like gradient clipping.
torch.nn.utils.clip_grad_norm_, PyTorch Team, 2023 - Official PyTorch documentation for the gradient clipping utility, providing practical implementation details and usage examples.