Sequence to Sequence Learning with Neural Networks, Ilya Sutskever, Oriol Vinyals, Quoc V. Le, 2014Advances in Neural Information Processing Systems, Vol. 27 (NeurIPS) - Introduces the sequence-to-sequence model with a fixed-length context vector, establishing the architectural limitation discussed.
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Ćukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems, Vol. 30 (Curran Associates, Inc.) - Introduces the Transformer architecture, fully employing the attention mechanism to process sequences efficiently without recurrence.