Architectural Considerations for Diffusion Models (U-Net)
Was this section helpful?
U-Net: Convolutional Networks for Biomedical Image Segmentation, Olaf Ronneberger, Philipp Fischer, Thomas Brox, 2015Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Vol. 9351DOI: 10.48550/arXiv.1505.04597 - Introduces the U-Net architecture with an encoder-decoder structure and skip connections, which became foundational for image-to-image tasks, including noise prediction in diffusion models.
Denoising Diffusion Probabilistic Models, Jonathan Ho, Ajay Jain, Pieter Abbeel, 2020Advances in Neural Information Processing Systems (NeurIPS), Vol. 33DOI: 10.48550/arXiv.2006.11239 - Presents the seminal Denoising Diffusion Probabilistic Models (DDPM) framework, establishing the use of U-Net for noise prediction with a simplified loss function in the reverse diffusion process.
Diffusion Models Beat GANs on Image Synthesis, Prafulla Dhariwal, Alex Nichol, 2021Advances in Neural Information Processing Systems (NeurIPS), Vol. 34DOI: 10.48550/arXiv.2105.05233 - Details architectural choices for U-Nets in diffusion models, including self-attention layers, Group Normalization, and effective timestep conditioning, leading to significant improvements in image synthesis quality.
High-Resolution Image Synthesis with Latent Diffusion Models, Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer, 2022CVPR 2022DOI: 10.48550/arXiv.2112.10752 - Introduces Latent Diffusion Models, which adapt the U-Net architecture to a latent space, incorporating self-attention and cross-attention mechanisms for high-resolution, conditional image generation.