Attention Mechanisms in U-Nets (Self-Attention, Cross-Attention)
Was this section helpful?
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems (NeurIPS), Vol. 30DOI: 10.48550/arXiv.1706.03762 - Introduced the Transformer architecture and the concept of multi-head self-attention, a fundamental component for modern attention mechanisms.
Non-local Neural Networks, Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He, 2018Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)DOI: 10.48550/arXiv.1711.07971 - Presented non-local operations, a generalization of self-attention for capturing long-range dependencies in convolutional neural networks, relevant for enhancing U-Net context understanding.
High-Resolution Image Synthesis with Latent Diffusion Models, Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer, 2022Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)DOI: 10.48550/arXiv.2112.10752 - Describes Latent Diffusion Models, which use a U-Net backbone augmented with cross-attention to effectively integrate conditioning signals like text prompts for high-quality image generation.