Addressing Sequential Challenges with LSTMs and GRUs
Was this section helpful?
Long Short-Term Memory, Sepp Hochreiter, Jürgen Schmidhuber, 1997Neural Computation, Vol. 9 (The MIT Press)DOI: 10.1162/neco.1997.9.8.1735 - Introduces the Long Short-Term Memory (LSTM) architecture, providing a foundational understanding of its design and effectiveness in learning long-range dependencies.
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio, 2014Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Association for Computational Linguistics)DOI: 10.3115/v1/D14-1179 - This seminal paper introduced the Gated Recurrent Unit (GRU) as a simpler alternative to LSTMs, detailing its architecture and performance in sequence modeling tasks.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A comprehensive textbook covering the theoretical foundations and practical aspects of deep learning, including detailed explanations of recurrent neural networks, LSTMs, and GRUs.