You've learned about two primary methods for sampling from a trained diffusion model: Denoising Diffusion Probabilistic Models (DDPM) and Denoising Diffusion Implicit Models (DDIM). While both leverage the trained noise prediction network to reverse the diffusion process, they operate differently, leading to distinct trade-offs. Choosing between them depends on your specific needs regarding generation speed, sample quality, and the desired level of randomness.
Let's compare DDPM and DDIM across several important dimensions.
DDPM sampling, as originally formulated, requires simulating the reverse Markov chain step-by-step. If the model was trained for T timesteps (often T=1000 or more), DDPM sampling involves T sequential evaluations of the neural network. This makes the process computationally intensive and relatively slow, as each step depends on the previous one.
DDIM offers a significant advantage here. Because DDIM formulates the generation process differently (related to solving an underlying differential equation), it doesn't strictly require simulating every single timestep from the original forward process sequence 1,...,T. Instead, you can choose a subsequence of timesteps, say S<T steps (e.g., S=50,100,200), and perform the denoising updates only at these selected timesteps. This drastically reduces the number of required network evaluations, leading to much faster sample generation. For instance, using DDIM with 100 steps can be 10 times faster than using DDPM with 1000 steps.
Generally, DDPM sampling with a large number of steps (T=1000 or more) is known to produce high-fidelity samples that closely match the training data distribution. The slow, gradual denoising process with added stochasticity at each step contributes to this quality.
DDIM can often achieve comparable, or at least very good, sample quality with significantly fewer steps than DDPM. However, there's usually a trade-off: reducing the number of DDIM steps too much (e.g., below 20-50, depending on the model and dataset) can lead to a noticeable degradation in sample quality compared to a full DDPM run. The optimal number of DDIM steps often needs experimentation to balance speed and fidelity. For many applications, the quality from DDIM with 100-200 steps is sufficient and the speed gain is substantial.
Hypothetical comparison showing how sample quality (represented by a metric like FID where lower is better) might change with the number of sampling steps for DDPM and DDIM. DDIM achieves good quality much faster but might have a slightly higher floor than DDPM with maximum steps.
DDPM sampling is inherently stochastic. At each step t, the reverse transition pθ(xt−1∣xt) involves sampling, typically by adding Gaussian noise scaled by a variance term σt2. This means that even starting from the same initial noise xT, running the DDPM sampling process multiple times will produce different final samples x0. This stochasticity contributes to the diversity of generated samples.
DDIM introduces a parameter, often denoted as η (eta), which controls the amount of stochasticity in the sampling process.
This deterministic property (η=0) is useful for applications requiring reproducibility or for tasks like image inversion and manipulation where you want a predictable mapping between the latent noise and the generated image. Values between 0 and 1 allow interpolating between deterministic and stochastic generation.
DDPM sampling follows the predefined sequence of timesteps T,T−1,...,1. It doesn't easily allow for varying the "size" of the steps taken during denoising.
DDIM's formulation provides more flexibility. By allowing sampling using a subsequence of the original timesteps, DDIM effectively allows for larger, non-uniform steps in the denoising process. This is mathematically justified by its connection to solving differential equations, where step sizes can be adapted.
Feature | DDPM | DDIM |
---|---|---|
Speed | Slow (Requires T steps) | Fast (Can use S << T steps) |
Quality | High (especially with many steps) | Good (Often comparable with fewer steps, degrades if too few) |
Stochasticity | Always Stochastic | Controlled by η; Deterministic if η=0 |
Step Size | Fixed (Follows original T steps) | Flexible (Uses subsequence of steps) |
Use Case | Max fidelity, diversity generation | Faster generation, interactive use, deterministic outputs |
Choosing Between DDPM and DDIM:
In practice, DDIM is very widely used due to its significant speed advantage while often maintaining excellent sample quality. Understanding these trade-offs allows you to select the sampling method that best aligns with the requirements of your specific generative task.
© 2025 ApX Machine Learning