Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems (NeurIPS 2017), Vol. 30DOI: 10.48550/arXiv.1706.03762 - 这篇论文介绍了 Transformer 架构和自注意力机制,它们是扩散 Transformer(DiT)中用于建模全局依赖的核心。