An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit and Neil Houlsby, 2020International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.2010.11929 - 介绍了Vision Transformer (ViT),展示了如何通过图像分块和序列处理将Transformer应用于图像数据。
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30 (NeurIPS)DOI: https://doi.org/10.48550/arXiv.1706.03762 - 介绍了Transformer架构,为不依赖循环或卷积的自注意力机制和序列到序列模型奠定了基础。