Sequence to Sequence Learning with Neural Networks, Ilya Sutskever, Oriol Vinyals, Quoc V. Le, 2014Advances in Neural Information Processing Systems, Vol. 27 (NeurIPS) - 介绍了使用固定长度上下文向量的序列到序列模型,明确了本节讨论的架构限制。
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems, Vol. 30 (Curran Associates, Inc.) - 介绍了Transformer架构,该架构完全采用注意力机制,无需循环即可高效处理序列。