While Denoising Diffusion Probabilistic Models (DDPMs) provide a powerful framework for high-quality image generation, their iterative sampling process, often requiring hundreds or thousands of steps, presents a significant computational bottleneck. Denoising Diffusion Implicit Models (DDIMs) were introduced as a generalization of DDPMs specifically designed to address this limitation by enabling much faster sampling.
A significant departure from DDPMs lies in the nature of the reverse process. DDPMs assume a Markovian reverse process, meaning pθ(x0:T)=p(xT)∏t=1Tpθ(xt−1∣xt), where each step xt−1 only depends on the previous step xt. DDIMs, however, leverage a more general, non-Markovian inference process. This seemingly small change has profound implications. Importantly, DDIMs utilize the exact same neural network, ϵθ(xt,t), trained with the standard DDPM objective (often the simplified version predicting noise). The innovation is entirely within the sampling procedure.
The core idea behind DDIM sampling stems from analyzing the conditional distribution q(xt−1∣xt,x0) used in the DDPM forward process derivation. Recall that in DDPM, the reverse step pθ(xt−1∣xt) aims to approximate q(xt−1∣xt). DDIM instead designs a sampling process that directly uses properties derived from q(xt−1∣xt,x0).
First, we can obtain an estimate of the initial data point x0 given xt and the predicted noise ϵθ(xt,t). Using the forward process definition xt=αˉtx0+1−αˉtϵ (where ϵ∼N(0,I) and αˉt=∏i=1tαi=∏i=1t(1−βi)), we can rearrange to predict x0:
x^0(xt,t)=αˉt1(xt−1−αˉtϵθ(xt,t))This x^0 represents the model's best guess of the original clean image given the noisy image xt at timestep t.
Now, instead of sampling xt−1 from the approximate posterior pθ(xt−1∣xt), DDIM defines a direct sampling step using x^0. The full DDIM update step, considering a subsequence of timesteps ti,ti−1 (where i goes from S down to 1, and tS=T,t0=0), is given by:
xti−1=Direction to x0αˉti−1x^0(xti,ti)+Direction of noise1−αˉti−1−σti2⋅ϵθ(xti,ti)+Random noiseσtiztHere, zt∼N(0,I) is fresh Gaussian noise, and σti2 controls the stochasticity of the process. It's typically parameterized by η:
σti2=η(1−αˉti)(1−αˉti−1)(1−αˉti−1αˉti)A major feature of DDIM arises when setting the hyperparameter η=0. This makes σti=0, eliminating the random noise term zt and resulting in a deterministic update rule:
xti−1=αˉti−1x^0(xti,ti)+1−αˉti−1⋅ϵθ(xti,ti)This deterministic nature means that starting from the same initial noise xT, the sampling process will always produce the exact same final image x0. This property is valuable for tasks requiring reproducibility or manipulation of the latent space.
Furthermore, the non-Markovian formulation allows DDIM to skip steps during sampling. While DDPM typically requires sampling across all T timesteps (e.g., T=1000), DDIM can use a much shorter subsequence τ={t1,t2,...,tS} where S≪T (e.g., S=20,50, or 100). The sampler jumps directly from xti to xti−1, significantly reducing the number of required forward passes through the network ϵθ.
Comparison of sampling paths for DDPM (top, blue) and DDIM (bottom, red). DDIM allows for significantly fewer steps (S) compared to the original number of diffusion steps (T).
The speedup offered by DDIM comes with a trade-off. While significantly faster, using fewer sampling steps (S) can sometimes lead to a slight reduction in sample quality or diversity compared to running the full DDPM process or using a larger S. The choice of S and η allows tuning this balance between speed and fidelity. Setting η=1 recovers a process closely related to the original DDPM sampling (though still using the non-Markovian structure over the chosen subsequence), reintroducing stochasticity.
From a theoretical perspective, the deterministic DDIM (η=0) process can be interpreted as approximating the solution trajectory of a specific probability flow Ordinary Differential Equation (ODE) related to the diffusion process. This connection bridges diffusion models with continuous-time generative models and provides a foundation for developing even more advanced ODE-based samplers, which we will explore in Chapter 6.
Understanding the DDIM sampling mechanism, its deterministic variant, and the ability to accelerate generation by skipping steps is fundamental. It not only provides a practical method for faster sampling with existing DDPM-trained models but also serves as a building block for many subsequent advancements in diffusion model sampling and distillation techniques covered later in this course.
© 2025 ApX Machine Learning