Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing SystemsDOI: 10.48550/arXiv.1706.03762 - Introduces the Transformer architecture and its self-attention mechanism, which are fundamental to the discussed autoencoders.
An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale, Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby, 2021International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.2010.11929 - Introduces the Vision Transformer (ViT) architecture, which provides the context for MAE's application in computer vision.