The DDPM sampling algorithm provides a principled way to reverse the diffusion process. However, it often requires simulating many small steps (equal to the number of forward steps, T), which can be computationally expensive and slow. Imagine needing 1000 steps to generate a single image; this quickly becomes impractical for many applications.
Fortunately, Denoising Diffusion Implicit Models (DDIM), introduced by Song, Meng, and Ermon (2020), offer a more flexible and often significantly faster alternative. The insight behind DDIM is that the objective function used to train the noise prediction network ϵθ doesn't strictly depend on the Markovian property of the forward process assumed by DDPM. DDIM uses this by proposing a different generative process (reverse process) that is non-Markovian but still uses the same trained network ϵθ.
The Core Idea of DDIM
Recall the DDPM reverse step aims to approximate p(xt−1∣xt). DDIM takes a different approach. It starts from the property that given the original data x0 and the noise ϵ used to generate xt, we can write:
xt=αˉtx0+1−αˉtϵ
We can rearrange this to get an estimate of the original data x0 based on the current noisy sample xt and the predicted noise ϵθ(xt,t):
x^0(xt,t)=αˉtxt−1−αˉtϵθ(xt,t)
This x^0 represents a prediction of the final clean data point, made from the noisy intermediate xt.
DDIM then defines the step to xt−1 by combining this predicted x^0 with the predicted noise ϵθ(xt,t), but in a way that allows for deterministic transitions and skipping steps. The general DDIM update rule is:
xt−1=Predicted x0 scaledαˉt−1x^0(xt,t)+Direction pointing to xt1−αˉt−1−σt2ϵθ(xt,t)+Random noiseσtz
where z∼N(0,I) is standard Gaussian noise, and σt controls the amount of stochasticity.
Deterministic Sampling (η=0)
A significant feature of DDIM is that we can choose the standard deviation σt. A common choice is to set σt=0. This makes the update step completely deterministic given xt and the prediction ϵθ(xt,t):
This deterministic nature means that starting from the same initial noise xT, we will always generate the same final output x0. This is quite different from DDPM, where the added noise σtz at each step leads to variations even from the same starting xT.
Faster Sampling with Subsequences
The non-Markovian nature and the deterministic update rule (when σt=0) allow DDIM to work effectively even when using only a subsequence of the original timesteps [1,...,T]. For instance, instead of taking 1000 steps, we might define a sampling sequence S of only 50 or 100 timesteps, say S=(T,T−NstepsT,T−2NstepsT,...,1), where Nsteps is the desired number of sampling steps (e.g., 50).
The DDIM update then proceeds by stepping between consecutive timesteps in this subsequence. If t and t′ are two consecutive timesteps in the subsequence S (with t′<t), the update calculates xt′ based on xt, using the same formulas but substituting t−1 with t′. This allows for much larger "jumps" in the denoising process, significantly accelerating generation.
The DDIM Algorithm
Here is the procedure for sampling using DDIM, often with the deterministic setting (σt=0):
Choose Timesteps: Select a subsequence of Nsteps timesteps S=(s1,s2,...,sNsteps) from (1,...,T), typically in descending order (e.g., s1=T,s2≈T−NstepsT,...,sNsteps≈1). Let s0=0.
Initial Noise: Sample the starting noise xs1=xT∼N(0,I).
Iterative Denoising: For i=1,...,Nsteps:
Get the current timestep t=si and the previous timestep t′=si+1 (using t′=0 if i=Nsteps).
Predict the noise using the trained model: ϵθ(xt,t).
Predict the clean data: x^0(xt,t)=αˉtxt−1−αˉtϵθ(xt,t).
Clip x^0 if necessary (e.g., to [−1,1] for normalized images).
Calculate the variance σt. A common generalization uses a parameter η∈[0,1]:
σt=η1−αˉt1−αˉt′1−αˉt′αˉt
Setting η=0 gives σt=0 (deterministic DDIM). Setting η=1 gives a stochastic version.
Calculate the next sample xt′:
xt′=αˉt′x^0(xt,t)+1−αˉt′−σt2ϵθ(xt,t)+σtz
where z∼N(0,I) if σt>0, and z=0 if σt=0. Note that if t′=0, then αˉt′=1 and the formula simplifies: x0=x^0(xt,t).
Output: Return the final sample x0.
Comparison of DDPM and deterministic DDIM (η=0) sampling. DDPM uses small, stochastic steps based on the Markovian assumption. DDIM calculates a predicted x0 at each step and uses it to take larger, deterministic steps according to a chosen subsequence S, significantly reducing the number of required network evaluations.
The ability to use fewer steps (Nsteps≪T) makes DDIM a very popular choice for practical applications where generation speed is important. In the next section, we'll discuss the trade-offs between these two sampling methods.
Was this section helpful?
Denoising Diffusion Implicit Models, Jiaming Song, Chenlin Meng, Stefano Ermon, 2020International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.2010.02502 - The original research paper introducing the DDIM algorithm, detailing its non-Markovian generative process, deterministic sampling, and ability to use subsequences for faster generation.
Score-Based Generative Modeling through Stochastic Differential Equations, Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole, 2021International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.2011.13456 - Provides a unifying theoretical framework for diffusion models, including DDPM and DDIM, by connecting them to SDEs and ODEs. This helps understand the theoretical underpinnings of DDIM's deterministic and faster sampling.