On the difficulty of training recurrent neural networks, Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio, 2013Proceedings of the 30th International Conference on Machine Learning (ICML), Vol. 28 (Proceedings of Machine Learning Research)DOI: 10.55982/pascanu13.1310 - A foundational paper that rigorously analyzes the vanishing and exploding gradient problems in RNNs, explaining their causes and proposing gradient clipping as a solution.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - An authoritative textbook providing a comprehensive theoretical background on recurrent neural networks and their training difficulties, including a detailed discussion of exploding gradients.
CS224N: Natural Language Processing with Deep Learning, Lecture Notes 06: Recurrent Neural Networks, Tatsunori Hashimoto (slides mostly from Christopher Manning's 2023 version), 2023 (Stanford University) - High-quality lecture notes from a leading university course, providing a clear and pedagogical explanation of recurrent neural networks, including a discussion of the exploding gradient problem and its practical implications.