Long Short-Term Memory, Sepp Hochreiter, Jürgen Schmidhuber, 1997Neural Computation, Vol. 9 (The MIT Press)DOI: 10.1162/neco.1997.9.8.1735 - Introduces Long Short-Term Memory (LSTM) networks, a model designed to address gradient problems in recurrent neural networks and learn long-term dependencies.
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio, 2014Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Association for Computational Linguistics)DOI: 10.3115/v1/D14-1179 - Presents the Gated Recurrent Unit (GRU), an alternative to LSTMs that achieves comparable performance with fewer parameters.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A comprehensive textbook on deep learning fundamentals, including optimization, regularization, and training methods for neural networks.