Scalable Diffusion Models with Transformers, William Peebles, Saining Xie, 2023Proceedings of the 40th International Conference on Machine Learning (ICML)DOI: 10.48550/arXiv.2212.09748 - Introduces the Diffusion Transformer (DiT) architecture, its adaptive Layer Normalization (adaLN-Zero) conditioning mechanism, and demonstrates predictable scaling properties in diffusion models.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby, 2021International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.2010.11929 - Presents Vision Transformers (ViTs) and the methodology of converting images into sequences of patches for transformer processing, which is fundamental to DiT's input handling.
Hugging Face Diffusers Library, Hugging Face, 2024 - Official documentation for a widely used open-source library that provides pre-trained diffusion models and tools for implementing and training custom diffusion models, including DiTs.