Sequence to Sequence Learning with Neural Networks, Ilya Sutskever, Oriol Vinyals, Quoc V. Le, 2014Advances in Neural Information Processing Systems 27 (NIPS 2014)DOI: 10.48550/arXiv.1409.3215 - A foundational paper that introduced the sequence-to-sequence model using LSTMs for machine translation, showcasing its ability to map input sequences to output sequences of different lengths.
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio, 2014Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)DOI: 10.48550/arXiv.1406.1078 - This paper concurrently introduced the RNN Encoder-Decoder framework, demonstrating its applicability to machine translation and the use of GRUs, which are central to the section's discussion.
Long Short-Term Memory, Sepp Hochreiter, Jürgen Schmidhuber, 1997Neural Computation, Vol. 9 (MIT Press)DOI: 10.1162/neco.1997.9.8.1735 - The original paper introducing Long Short-Term Memory (LSTM) networks, which are explicitly mentioned as a core recurrent architecture used within the sequence-to-sequence framework.
Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - Chapter 10 offers comprehensive theoretical explanations of recurrent neural networks, LSTMs, and the encoder-decoder framework, providing valuable context for the models covered.