While Denoising Diffusion Probabilistic Models (DDPMs) achieve impressive generation quality, a significant drawback is their slow sampling speed. Generating a single sample often requires hundreds or thousands of sequential denoising steps, corresponding to the number of noise levels T used during training. Techniques designed to accelerate sampling and refine the generation process are examined, primarily focusing on Denoising Diffusion Implicit Models (DDIM) and the impact of variance schedules.
Denoising Diffusion Implicit Models (DDIM) for Faster Sampling
DDIM offers a modification to the generative (reverse) process of DDPMs that allows for much faster sampling, often reducing the number of required steps by 10-100x without needing to retrain the model. The innovation lies in formulating a non-Markovian reverse process that still uses the same noise prediction network ϵθ trained with the DDPM objective.
Recall the standard DDPM reverse step:
pθ(xt−1∣xt)=N(xt−1;μθ(xt,t),β~tI)
where μθ depends on ϵθ(xt,t) and the variance β~t is fixed based on the noise schedule βt. This process is Markovian, meaning xt−1 depends only on xt.
DDIM introduces a more general family of non-Markovian diffusion processes. The core idea involves first predicting the final clean data point x0 from the current noisy state xt, and then using this prediction to guide the step towards xt−1. The predicted x0 is obtained by rearranging the forward process equation xt=αˉtx0+1−αˉtϵ:
x0pred(t)=αˉt1(xt−1−αˉtϵθ(xt,t))
This predicted x0 represents the model's best estimate of the original data given the noisy input xt and the current time step t.
The DDIM reverse step then samples xt−1 using this predicted x0:
xt−1=αˉt−1x0pred(t)+direction pointing to xt1−αˉt−1−σt2ϵθ(xt,t)+random noiseσtϵ′
Here, ϵ′∼N(0,I) is fresh random noise, and σt controls the stochasticity of this reverse step. The parameter σt is defined using a hyperparameter η≥0:
σt(η)=η1−αˉt1−αˉt−11−αˉt−1αˉt
The insight is the role of η:
Stochastic Case (η=1): When η=1, the value of σt2 becomes equal to the DDPM variance β~t=1−αˉt1−αˉt−1βt. In this specific case, the DDIM sampling process recovers the original DDPM Markovian process.
Deterministic Case (η=0): When η=0, we have σt=0. The random noise term vanishes, and the update becomes fully deterministic given xt:
xt−1=αˉt−1x0pred(t)+1−αˉt−1ϵθ(xt,t)
This makes the generative process implicit because xt−1 is computed directly, not sampled from a distribution. This deterministic nature allows for significantly larger jumps in the time steps during sampling. Instead of using all T steps (e.g., T=1000), we can use a subsequence of time steps τ1,τ2,...,τS where S≪T (e.g., S=50 or S=100). The update rule is applied successively for t=τS,τS−1,...,τ1. This deterministic variant is often associated with the Probability Flow Ordinary Differential Equation (ODE) formulation of diffusion models.
Comparison between the DDPM reverse step and the deterministic DDIM reverse step (η=0). DDIM uses an intermediate prediction of the clean data x0 to determine xt−1.
Using η=0 (deterministic DDIM) typically yields high-quality samples with far fewer steps. Values of η between 0 and 1 allow interpolating between deterministic and stochastic generation, potentially adding diversity at the cost of some consistency. A major advantage of DDIM is that it uses the exact same network ϵθ trained for DDPMs. Only the sampling procedure changes, making it easy to deploy for faster generation with existing models.
Variance Schedules
The choice of the noise schedule, defined by βt for t=1,...,T, is another important aspect influencing model performance. This schedule determines how quickly noise is added in the forward process, controlling the signal-to-noise ratio at each step t. Common schedules include:
Linear Schedule:βt increases linearly from a small value β1 (e.g., 10−4) to a larger value βT (e.g., 0.02). This was used in the original DDPM paper.
Cosine Schedule: Proposed to improve training stability and sample quality. The cumulative noise level αˉt follows a cosine shape, preventing the signal from decaying too quickly early in the forward process. Specifically:
αˉt=f(0)f(t),wheref(t)=cos(1+st/T+s⋅2π)2
Here, s is a small offset (e.g., 0.008) to prevent βt from being too small near t=0. βt is then derived as βt=1−αˉt−1αˉt.
The square root of αˉt, representing the signal rate, decreases over time. A linear schedule for βt results in a roughly linear decrease in αˉt, while a cosine schedule maintains a higher signal rate for longer before decaying more rapidly.
Beyond fixed schedules, some research has explored learning the variance of the reverse process pθ(xt−1∣xt). The original DDPM fixes this variance to β~tI or βtI. However, the model ϵθ can be modified to also predict a parameter v that interpolates between these lower and upper bounds on the optimal reverse variance. While learning the variance can improve log-likelihood scores, it often doesn't significantly enhance perceptual quality (measured by metrics like FID) and adds complexity. The fixed, small variance approach (often approximated by β~t) generally works well in practice. The DDIM framework sidesteps explicit variance learning by controlling stochasticity via η, offering a flexible way to manage the reverse process variance implicitly.
"In summary, DDIM provides a powerful method for accelerating diffusion model sampling by defining a deterministic or near-deterministic reverse path, leveraging the same trained noise prediction network. The choice of variance schedule (βt) remains an important design decision affecting model performance, with cosine schedules often being preferred over linear ones. These techniques collectively make diffusion models more practical for applications requiring efficient generation."
Was this section helpful?
Denoising Diffusion Implicit Models, Jiaming Song, Chenlin Meng, Stefano Ermon, 2020ICLR 2021DOI: 10.48550/arXiv.2010.02502 - Introduces DDIM, a non-Markovian generative process for diffusion models that enables significantly faster and deterministic sampling without retraining.
Denoising Diffusion Probabilistic Models, Jonathan Ho, Ajay Jain, and Pieter Abbeel, 2020Advances in Neural Information Processing Systems (NeurIPS)DOI: 10.48550/arXiv.2006.11239 - The foundational paper that introduced Denoising Diffusion Probabilistic Models (DDPMs), defining the forward and reverse processes.
Improved Denoising Diffusion Probabilistic Models, Alexander Quinn Nichol, Prafulla Dhariwal, 2021Proceedings of the 38th International Conference on Machine Learning, Vol. 139 (PMLR)DOI: 10.1109/ICCV48922.2021.01050 - Proposes the cosine variance schedule and other improvements to DDPMs, enhancing sample quality and training stability.
Score-Based Generative Modeling through Stochastic Differential Equations, Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole, 2021International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.2011.13456 - Presents a unified framework for score-based generative models and diffusion models, highlighting the connection between deterministic sampling (like DDIM with η=0) and probability flow ODEs.