While the DDPM sampling algorithm we just discussed provides a principled way to reverse the diffusion process, it often requires simulating many small steps (equal to the number of forward steps, T), which can be computationally expensive and slow. Imagine needing 1000 steps to generate a single image; this quickly becomes impractical for many applications.
Fortunately, Denoising Diffusion Implicit Models (DDIM), introduced by Song, Meng, and Ermon (2020), offer a more flexible and often significantly faster alternative. The insight behind DDIM is that the objective function used to train the noise prediction network ϵθ doesn't strictly depend on the Markovian property of the forward process assumed by DDPM. DDIM leverages this by proposing a different generative process (reverse process) that is non-Markovian but still uses the same trained network ϵθ.
Recall the DDPM reverse step aims to approximate p(xt−1∣xt). DDIM takes a different approach. It starts from the property that given the original data x0 and the noise ϵ used to generate xt, we can write:
xt=αˉtx0+1−αˉtϵWe can rearrange this to get an estimate of the original data x0 based on the current noisy sample xt and the predicted noise ϵθ(xt,t):
x^0(xt,t)=αˉtxt−1−αˉtϵθ(xt,t)This x^0 represents a prediction of the final clean data point, made from the noisy intermediate xt.
DDIM then defines the step to xt−1 by combining this predicted x^0 with the predicted noise ϵθ(xt,t), but in a way that allows for deterministic transitions and skipping steps. The general DDIM update rule is:
xt−1=Predicted x0 scaledαˉt−1x^0(xt,t)+Direction pointing to xt1−αˉt−1−σt2ϵθ(xt,t)+Random noiseσtzwhere z∼N(0,I) is standard Gaussian noise, and σt controls the amount of stochasticity.
A significant feature of DDIM is that we can choose the standard deviation σt. A common choice is to set σt=0. This makes the update step completely deterministic given xt and the prediction ϵθ(xt,t):
xt−1=αˉt−1x^0(xt,t)+1−αˉt−1ϵθ(xt,t)Substituting the expression for x^0(xt,t):
xt−1=αˉt−1(αˉtxt−1−αˉtϵθ(xt,t))+1−αˉt−1ϵθ(xt,t)This deterministic nature means that starting from the same initial noise xT, we will always generate the same final output x0. This is quite different from DDPM, where the added noise σtz at each step leads to variations even from the same starting xT.
The non-Markovian nature and the deterministic update rule (when σt=0) allow DDIM to work effectively even when using only a subsequence of the original timesteps [1,...,T]. For instance, instead of taking 1000 steps, we might define a sampling sequence S of only 50 or 100 timesteps, say S=(T,T−NstepsT,T−2NstepsT,...,1), where Nsteps is the desired number of sampling steps (e.g., 50).
The DDIM update then proceeds by stepping between consecutive timesteps in this subsequence. If t and t′ are two consecutive timesteps in the subsequence S (with t′<t), the update calculates xt′ based on xt, using the same formulas but substituting t−1 with t′. This allows for much larger "jumps" in the denoising process, significantly accelerating generation.
Here is the procedure for sampling using DDIM, often with the deterministic setting (σt=0):
Comparison of DDPM and deterministic DDIM (η=0) sampling. DDPM uses small, stochastic steps based on the Markovian assumption. DDIM calculates a predicted x0 at each step and uses it to take larger, deterministic steps according to a chosen subsequence S, significantly reducing the number of required network evaluations.
The ability to use fewer steps (Nsteps≪T) makes DDIM a very popular choice for practical applications where generation speed is important. In the next section, we'll discuss the trade-offs between these two sampling methods.
© 2025 ApX Machine Learning