Non-local Neural Networks, Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He, 2018Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE Computer Society)DOI: 10.1109/CVPR.2018.00813 - 介绍了非局部神经网络架构、其通用公式和具体实现,是本节内容的原始来源。
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems (NeurIPS), Vol. 30DOI: 10.48550/arXiv.1706.03762 - 提出了Transformer架构和自注意力机制,非局部操作对此进行了推广,并与它紧密相关。
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby, 2021International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.2010.11929 - 介绍了Vision Transformer (ViT) 模型,展示了如何将Transformer架构和自注意力机制有效地应用于图像分类,这建立在全局上下文建模的思想之上。