Recurrent Neural Networks (RNNs) and LSTMs in Flux
Was this section helpful?
Long Short-Term Memory, Sepp Hochreiter and Jürgen Schmidhuber, 1997Neural Computation, Vol. 9 (MIT Press)DOI: 10.1162/neco.1997.9.8.1735 - The original paper that introduced Long Short-Term Memory (LSTM) networks, offering a solution to the vanishing gradient problem in RNNs.
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio, 2014Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Association for Computational Linguistics)DOI: 10.3115/v1/D14-1179 - This paper introduced the Gated Recurrent Unit (GRU), a simplified recurrent network architecture that often performs similarly to LSTMs.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A standard textbook that provides a thorough foundation in deep learning concepts, including detailed discussions of RNNs, LSTMs, and GRUs.
Recurrent Layers, The Flux.jl Community, 2025 (The Flux.jl Community) - Official documentation for Flux.jl's recurrent neural network layers, including RNN, LSTM, and GRU, with usage instructions and examples.