As we explored in previous chapters, diffusion models excel at learning the underlying distribution of a dataset p(x) and generating high-fidelity samples by reversing a gradual noising process. Starting from pure Gaussian noise xT, the reverse process iteratively denoises it to produce a sample x0 that looks like it came from the original data.
However, this standard generation process is unconditional. While it produces realistic outputs, it doesn't offer explicit control over what specific kind of output is generated. If you train a diffusion model on a diverse dataset of animals, running the standard sampling procedure might yield an image of a dog, a cat, a bird, or any other animal present in the training data. You effectively get a random sample from the learned distribution p(x), but you cannot directly ask the model to generate, say, only images of cats.
This lack of direct control limits the practical applicability of unconditional models in many scenarios. Often, we need to guide the generation process based on specific requirements or inputs. Consider these common use cases:
In all these examples, the goal is not just to sample from the overall data distribution p(x), but rather to sample from a conditional distribution p(x∣y), where y represents the conditioning information. This conditioning variable y could be:
Therefore, we need mechanisms to incorporate this conditioning information y into the diffusion model's generation process. We need ways to steer the iterative denoising steps so that the final output x0 not only looks realistic (belongs to the data manifold) but also aligns with the provided condition y.
This chapter focuses on precisely these mechanisms. We will investigate techniques that allow us to exert control over the diffusion model's output, transforming it from a generator of random samples into a controllable synthesis engine. We'll start by looking at how external models can guide the process and then move to more integrated approaches like classifier-free guidance, which has become a standard technique for conditional diffusion models.
© 2025 ApX Machine Learning