Long Short-Term Memory, Sepp Hochreiter, Jürgen Schmidhuber, 1997Neural Computation, Vol. 9 (MIT Press)DOI: 10.1162/neco.1997.9.8.1735 - The original paper introducing Long Short-Term Memory (LSTM) networks, fundamental for addressing the vanishing gradient problem in recurrent neural networks and crucial for effective sequence prediction.
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio, 2014Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Association for Computational Linguistics)DOI: 10.3115/v1/D14-1179 - This paper introduces the Gated Recurrent Unit (GRU), an alternative to LSTMs, and the encoder-decoder architecture, which is highly relevant for many-to-many sequence prediction tasks.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - An authoritative textbook offering a comprehensive theoretical foundation on recurrent neural networks, LSTMs, and GRUs, covering their mechanics and applications in sequence modeling and prediction.
Deep Learning with Python, François Chollet, 2021 (Manning Publications) - A practical guide for implementing deep learning models, including recurrent neural networks, for various sequence prediction tasks such as time series forecasting using the Keras framework.