So far, we've seen how diffusion models can generate data by reversing a noising process, starting from pure random noise xT∼N(0,I). This produces samples reflecting the training distribution, but without specific control over the output.
Often, however, we want to guide the generation process. For instance, we might want to generate an image of a specific object class or create an image based on a text description. This chapter introduces methods for conditional generation, allowing us to influence the output of the diffusion model based on additional information, often denoted as conditioning y.
We will examine techniques like classifier guidance, where a separate classification model helps steer the sampling towards a desired attribute. We will then study classifier-free guidance (CFG), a widely used method that achieves similar control without needing an external classifier, by modifying the training and sampling procedures. Finally, we'll touch upon the fundamentals of conditioning on text prompts, a key technique behind modern text-to-image models.
By the end of this chapter, you will understand how to implement and apply these guidance techniques to gain more control over the generative capabilities of diffusion models.
6.1 Motivation for Conditional Generation
6.2 Classifier Guidance
6.3 Classifier-Free Guidance (CFG)
6.4 Implementing Classifier-Free Guidance
6.5 Text Conditioning Basics
6.6 Architecture Modifications for Conditioning
6.7 Hands-on Practical: Applying Guidance
© 2025 ApX Machine Learning