Now that we understand how the diffusion model, specifically the U-Net, is trained to predict the noise ϵ present in a noisy input xt at timestep t, let's explore how we use this trained model to generate new data. The Denoising Diffusion Probabilistic Models (DDPM) algorithm provides the original, foundational procedure for this generative process.
As outlined in the chapter introduction, the core idea is to start with pure noise and progressively refine it using the learned denoising steps. We begin by sampling an initial tensor xT from a standard Gaussian distribution, xT∼N(0,I). This xT represents maximum entropy, pure noise, corresponding to the end state of the forward diffusion process.
The DDPM sampling algorithm then iteratively applies the learned reverse process, stepping backward from timestep t=T down to t=1. In each step, the goal is to sample a slightly less noisy version xt−1 given the current state xt.
Recall from Chapter 3 that the reverse transition pθ(xt−1∣xt) is approximated by a Gaussian distribution whose mean μθ(xt,t) depends on xt and the predicted noise ϵθ(xt,t), and whose variance σt2 is related to the noise schedule βt.
The model ϵθ(xt,t) takes the current noisy image xt and the timestep t as input and outputs its prediction of the noise component that was added to get from x0 to xt. Using this prediction, we can estimate the mean of the distribution for the previous state xt−1. The equation for the mean μθ(xt,t) is derived from the properties of the forward and reverse processes:
μθ(xt,t)=αt1(xt−1−αˉt1−αtϵθ(xt,t))Here, αt=1−βt and αˉt=∏i=1tαi are parameters derived from the noise schedule βt used during the forward process. This equation essentially takes the current noisy sample xt and subtracts the scaled predicted noise ϵθ to estimate the mean of the previous, less noisy state xt−1.
The variance of the reverse step, σt2, is also determined by the noise schedule. A common choice, derived to match the variance of the forward process posterior q(xt−1∣xt,x0), is:
σt2=β~t=1−αˉt1−αˉt−1βtNote that σt2=0 for t=1. To perform one denoising step and sample xt−1 from xt, we calculate the mean μθ(xt,t) using the model's prediction and then add Gaussian noise scaled by the standard deviation σt:
xt−1=μθ(xt,t)+σtzwhere z∼N(0,I) is standard Gaussian noise. This added noise z introduces stochasticity into the generation process, allowing the model to produce diverse samples even when starting from the same initial xT (though typically we start from different xT samples). However, for the very last step (t=1), we usually set z=0 to obtain the final deterministic mean prediction as our output x0.
Putting these steps together, the complete DDPM sampling algorithm proceeds as follows:
This process is visualized below:
Diagram illustrating the iterative DDPM sampling process. Starting from noise xT, each step uses the noise predictor ϵθ to compute the mean μθ for the previous state, then samples xt−1 by adding scaled noise σtz (except for the last step where z=0).
The quality of the final sample x0 depends heavily on the accuracy of the trained noise predictor ϵθ and the chosen noise schedule (βt) and number of diffusion steps (T). DDPM typically requires a large number of steps (e.g., T=1000) to achieve high-quality results, which can make sampling relatively slow. We will explore faster alternatives like DDIM in the subsequent sections.
© 2025 ApX Machine Learning