In the previous section, we outlined the general procedure for generating data using the reverse diffusion process, starting from noise xT and iteratively applying the learned denoising step to arrive at x0. A significant characteristic of the standard Denoising Diffusion Probabilistic Models (DDPM) sampling process is its inherent stochasticity. Even if you were to start two generation processes from the exact same initial noise tensor xT, you would likely end up with two different final samples x0. Let's examine why this happens.
Recall from Chapter 3 that the reverse process aims to approximate the true posterior p(xt−1∣xt). In DDPM, we parameterize this reverse transition at each step t as a Gaussian distribution:
pθ(xt−1∣xt)=N(xt−1;μθ(xt,t),σt2I)
Here, μθ(xt,t) represents the mean of the distribution for the previous state xt−1, given the current state xt and timestep t. Our trained neural network ϵθ(xt,t) is used to calculate this mean. The term σt2I represents the variance, where σt2 is typically a hyperparameter derived from the forward process noise schedule (often related to βt or β~t).
The important point is that obtaining xt−1 from xt involves sampling from this Gaussian distribution, not just calculating the mean μθ(xt,t). The sampling operation looks like this:
xt−1=μθ(xt,t)+σtz
where z is a random vector sampled from a standard Gaussian distribution, z∼N(0,I).
This addition of the scaled random noise σtz at every step t (from T down to 1) is the source of the variability in the DDPM sampling process. Each step introduces a small amount of randomness.
Imagine the path from pure noise xT to the final sample x0 as a sequence of T steps. At each step, the model predicts the direction (the mean μθ), but then takes a slightly randomized step in that direction due to the added noise σtz.
This diagram shows how, starting from the same state xt, sampling different noise vectors zt and zt′ during the reverse step leads to slightly different states xt−1 for two potential generation paths. Over many steps, these small divergences accumulate, resulting in distinct final samples x0.
Because these small random perturbations accumulate over the entire sequence of T steps, the final output x0 reflects the sum of these random choices. This explains why running the DDPM sampling process multiple times, even with the same hyperparameters and trained model, yields a diverse set of generated samples. This inherent randomness is often desirable, as it allows a single trained model to generate a wide variety of outputs.
The magnitude of the variance σt2 in the reverse step influences how much randomness is injected at each step. In the original DDPM paper, σt2 is set based on the noise schedule βt used in the forward process (specifically, σt2=β~t=1−αˉt1−αˉt−1βt or simply σt2=βt). While this is a standard choice tied theoretically to the forward process, it is possible to use different values. Larger values of σt2 generally lead to more diversity in the generated samples but might slightly reduce the quality or faithfulness to the learned data distribution if set too high. Smaller values reduce the stochasticity.
Understanding this source of variance is also important when we consider faster sampling methods like Denoising Diffusion Implicit Models (DDIM), which we will discuss next. DDIM reinterprets the generation process and introduces a parameter (often denoted η) that controls the amount of stochasticity. When η=0, DDIM sampling becomes deterministic given a starting noise xT. This means for a fixed xT, DDIM with η=0 will always produce the same x0. This contrasts sharply with DDPM, which is inherently stochastic due to the σtz term added at each step. This trade-off between stochasticity (diversity) and determinism (reproducibility, potentially faster sampling) is a primary difference between DDPM and DDIM sampling strategies.
© 2025 ApX Machine Learning