On the difficulty of training recurrent neural networks, Razvan Pascanu, Tomas Mikolov, Yoshua Bengio, 2013Proceedings of the 30th International Conference on Machine Learning (PMLR), Vol. 28 - This paper analyzes gradient instability in recurrent neural networks and suggests gradient clipping.
torch.nn.utils.clip_grad_norm_, PyTorch Developers, 2024 (PyTorch Foundation) - Official PyTorch documentation for the gradient clipping utility function.
Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - A textbook covering core deep learning methods, including optimization strategies.