Okay, we've trained our diffusion model. It has learned to predict the noise ϵ that was added to an image x0 to create a noisier version xt at a specific timestep t. The training objective, as discussed in Chapter 4, typically involved minimizing the difference between the predicted noise ϵθ(xt,t) and the actual noise ϵ used to generate xt.
Now, how do we use this trained model, ϵθ, to generate new data samples? The generation process, often called sampling or inference, works by reversing the forward diffusion process. Instead of starting with data and adding noise, we start with pure noise and progressively remove it, guided by our model.
The starting point for generation is a sample xT drawn from a standard Gaussian distribution:
xT∼N(0,I)This xT represents the state after the maximum number of noising steps in the forward process, essentially pure, unstructured noise. Our goal is to iteratively denoise this xT back through time, step by step, until we reach a clean sample x0.
The core idea is to use the trained noise prediction network ϵθ at each step t (from T down to 1) to estimate what the slightly less noisy sample xt−1 should look like, given the current noisy sample xt.
Imagine we are at timestep t with a sample xt. Our model ϵθ(xt,t) provides an estimate of the noise component within xt. We can use this estimate to take a step "backwards" towards xt−1. The specific mathematical operation depends on the chosen sampling algorithm (like DDPM or DDIM, which we'll detail next), but the fundamental principle is the same: use the predicted noise to guide the transition from xt to an approximation of xt−1.
This process is repeated iteratively:
Each reverse step pθ(xt−1∣xt) refines the sample, gradually transforming the initial unstructured noise into something that resembles the data distribution the model learned during training. If trained on images of faces, x0 should look like a face. If trained on images of cats, x0 should resemble a cat.
The following diagram illustrates this iterative denoising flow:
The generation process starts with random noise xT and iteratively applies the learned denoising function ϵθ at each timestep t to produce progressively cleaner samples, culminating in the final output x0.
This overall flow provides the foundation for generating data. The next sections will detail the specific algorithms, starting with DDPM, that define exactly how the transition from xt to xt−1 is calculated using the predicted noise ϵθ(xt,t).
Was this section helpful?
© 2025 ApX Machine Learning