Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio, 2014Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)DOI: 10.3115/v1/D14-1179 - This foundational paper introduced the Gated Recurrent Unit (GRU) architecture as an efficient alternative for sequence modeling within an encoder-decoder framework.