While Denoising Diffusion Probabilistic Models (DDPMs) achieve impressive generation quality, a significant drawback is their slow sampling speed. Generating a single sample often requires hundreds or thousands of sequential denoising steps, corresponding to the number of noise levels T used during training. This section examines techniques designed to accelerate sampling and refine the generation process, primarily focusing on Denoising Diffusion Implicit Models (DDIM) and the impact of variance schedules.
DDIM offers a modification to the generative (reverse) process of DDPMs that allows for much faster sampling, often reducing the number of required steps by 10-100x without needing to retrain the model. The innovation lies in formulating a non-Markovian reverse process that still uses the same noise prediction network ϵθ trained with the DDPM objective.
Recall the standard DDPM reverse step:
pθ(xt−1∣xt)=N(xt−1;μθ(xt,t),β~tI)where μθ depends on ϵθ(xt,t) and the variance β~t is fixed based on the noise schedule βt. This process is Markovian, meaning xt−1 depends only on xt.
DDIM introduces a more general family of non-Markovian diffusion processes. The core idea involves first predicting the final clean data point x0 from the current noisy state xt, and then using this prediction to guide the step towards xt−1. The predicted x0 is obtained by rearranging the forward process equation xt=αˉtx0+1−αˉtϵ:
x0pred(t)=αˉt1(xt−1−αˉtϵθ(xt,t))This predicted x0 represents the model's best estimate of the original data given the noisy input xt and the current time step t.
The DDIM reverse step then samples xt−1 using this predicted x0:
xt−1=αˉt−1x0pred(t)+direction pointing to xt1−αˉt−1−σt2ϵθ(xt,t)+random noiseσtϵ′Here, ϵ′∼N(0,I) is fresh random noise, and σt controls the stochasticity of this reverse step. The parameter σt is defined using a hyperparameter η≥0:
σt(η)=η1−αˉt1−αˉt−11−αˉt−1αˉtThe key insight is the role of η:
Comparison between the DDPM reverse step and the deterministic DDIM reverse step (η=0). DDIM uses an intermediate prediction of the clean data x0 to determine xt−1.
Using η=0 (deterministic DDIM) typically yields high-quality samples with far fewer steps. Values of η between 0 and 1 allow interpolating between deterministic and stochastic generation, potentially adding diversity at the cost of some consistency. A major advantage of DDIM is that it uses the exact same network ϵθ trained for DDPMs. Only the sampling procedure changes, making it easy to deploy for faster generation with existing models.
The choice of the noise schedule, defined by βt for t=1,...,T, is another important aspect influencing model performance. This schedule determines how quickly noise is added in the forward process, controlling the signal-to-noise ratio at each step t. Common schedules include:
The square root of αˉt, representing the signal rate, decreases over time. A linear schedule for βt results in a roughly linear decrease in αˉt, while a cosine schedule maintains a higher signal rate for longer before decaying more rapidly.
Beyond fixed schedules, some research has explored learning the variance of the reverse process pθ(xt−1∣xt). The original DDPM fixes this variance to β~tI or βtI. However, the model ϵθ can be modified to also predict a parameter v that interpolates between these lower and upper bounds on the optimal reverse variance. While learning the variance can improve log-likelihood scores, it often doesn't significantly enhance perceptual quality (measured by metrics like FID) and adds complexity. The fixed, small variance approach (often approximated by β~t) generally works well in practice. The DDIM framework elegantly sidesteps explicit variance learning by controlling stochasticity via η, offering a flexible way to manage the reverse process variance implicitly.
In summary, DDIM provides a powerful method for accelerating diffusion model sampling by defining a deterministic or near-deterministic reverse path, leveraging the same trained noise prediction network. The choice of variance schedule (βt) remains an important design decision affecting model performance, with cosine schedules often being preferred over linear ones. These techniques collectively make diffusion models more practical for real-world applications requiring efficient generation.
© 2025 ApX Machine Learning