Diffusion models excel at learning the underlying distribution of training data and generating diverse samples. However, there is often a need for more control over the generation process. For instance, generating an image of a specific object class (like a "cat" or a "dog") or synthesizing data possessing particular attributes falls within the domain of conditional generation. Two prominent techniques for achieving this control in diffusion models are Classifier Guidance and Classifier-Free Guidance (CFG).
Classifier Guidance uses a separate, pre-trained classifier model to steer the diffusion sampling process towards samples that exhibit desired characteristics, typically defined by a class label . The core idea is to modify the sampling steps to not only denoise the image but also make it more recognizable as class according to the classifier.
Recall that the reverse diffusion process aims to approximate the score function , which guides the sampling from noise towards data. To incorporate conditioning on a class , we want to sample from the conditional distribution . Using Bayes' theorem, we can relate the conditional score to the unconditional score and the classifier's prediction:
Taking the gradient with respect to gives:
Here, is the score estimated by the unconditional diffusion model, and is the gradient of the log-likelihood provided by a classifier trained to predict the class from a noisy input .
In practice, for models parameterized via noise prediction , the update direction during sampling is adjusted. The standard noise prediction is modified to incorporate the classifier's gradient. A common formulation for the guided noise prediction is:
Here, is the guidance scale, a hyperparameter that controls the strength of the conditioning. A higher value of pushes the generation process more strongly towards samples that the classifier recognizes as belonging to class .
Mechanism: At each step of the reverse diffusion process, the classifier examines the current noisy sample and calculates how changes to would increase the probability of the target class . This gradient information is then used to nudge the denoising step, effectively biasing the generation towards the desired class.
Advantages:
Disadvantages:
Classifier-Free Guidance (CFG) emerged as a way to achieve conditional generation without relying on a separate classifier model. It has become a widely adopted and highly effective technique, particularly prominent in large-scale models like those used for text-to-image synthesis.
Mechanism: The central idea is to train a single conditional diffusion model, typically parameterized by , which takes the conditioning information (e.g., a class label, a text embedding) as an additional input. During training, the conditioning input is randomly replaced with a special null token (representing unconditional generation) with some probability (e.g., 10-20% of the time). This forces the model to learn both the conditional noise prediction and the unconditional noise prediction within the same set of weights .
During sampling, both the conditional and unconditional noise predictions are computed at each step. The final noise prediction used for the denoising step is then calculated by extrapolating from the unconditional prediction in the direction of the conditional prediction:
Again, is the guidance scale (often denoted as in the literature).
Intuition: The term can be seen as implicitly representing the direction related to the condition in the noise prediction space. CFG effectively learns this direction directly from the data during training, rather than relying on an external classifier's gradient. Scaling this difference by strengthens the influence of the condition on the generation outcome.
Diagram illustrating the Classifier-Free Guidance mechanism during a single sampling step. Both unconditional () and conditional () noise predictions are computed from the current state . The final guided prediction is an extrapolation based on these two predictions and the guidance scale .
Advantages:
Disadvantages:
| Feature | Classifier Guidance | Classifier-Free Guidance (CFG) |
|---|---|---|
| External Model | Yes (Classifier $p_\phi(y | \mathbf{x}_t)$) |
| Training | Standard diffusion model + separate classifier training (on noisy data) | Modified diffusion model training (with conditional dropout) |
| Inference Speed | Needs diffusion model + classifier evaluation per step | Needs diffusion model evaluation twice per step (cond + uncond) |
| Typical Quality | Good, but sensitive to classifier quality & can have artifacts | Often state-of-the-art, generally higher quality and fewer artifacts |
| Implementation | Requires integrating two models | Single model, modified training loop |
| Flexibility | Can swap classifiers (if trained) | Guidance baked into the model |
The Guidance Scale (s): In both methods, the guidance scale (or ) plays a significant role. It controls the trade-off between sample fidelity to the condition and sample diversity/realism.
Finding an optimal value for usually requires empirical tuning for a specific model and task. It provides a powerful knob to adjust the generation behavior at inference time without retraining the model.
Guidance techniques are essential for directing the output of diffusion models towards specific desired properties, moving past simple unconditional generation. Classifier guidance uses an external classifier to inject conditioning information via gradients, while Classifier-Free Guidance achieves this more effectively by modifying the training process of the diffusion model itself, enabling it to learn conditional and unconditional generation simultaneously. CFG has become the standard approach due to its superior performance and elimination of the need for a separate, potentially problematic classifier model. Understanding and utilizing these guidance mechanisms is fundamental for applying diffusion models to practical conditional synthesis tasks.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with