Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems (NeurIPS)DOI: 10.48550/arXiv.1706.03762 - This foundational paper introduces the Transformer architecture and the concept of sinusoidal positional embeddings, which are directly adapted for timestep conditioning in diffusion models.
Denoising Diffusion Probabilistic Models, Jonathan Ho, Ajay Jain, Pieter Abbeel, 2020Advances in Neural Information Processing Systems (NeurIPS)DOI: 10.48550/arXiv.2006.11239 - This seminal paper introduces the Denoising Diffusion Probabilistic Models (DDPM) framework, detailing the U-Net architecture for noise prediction and the integration of timestep information via embeddings.
U-Net Architecture and Timestep Embeddings in Diffusers, Hugging Face, 2023 (Hugging Face) - This documentation section details the U-Net architecture used in diffusion models and specifically illustrates how timestep embeddings are created and integrated to condition the U-Net's noise prediction.