A diffusion model is trained to predict the noise $\epsilon$ that was added to an image $x_0$ to create a noisier version $x_t$ at a specific timestep $t$. The training objective typically involves minimizing the difference between the predicted noise $\epsilon_\theta(x_t, t)$ and the actual noise $\epsilon$ used to generate $x_t$.Now, how do we use this trained model, $\epsilon_\theta$, to generate new data samples? The generation process, often called sampling or inference, works by reversing the forward diffusion process. Instead of starting with data and adding noise, we start with pure noise and progressively remove it, guided by our model.The starting point for generation is a sample $x_T$ drawn from a standard Gaussian distribution: $$ x_T \sim \mathcal{N}(0, \mathbf{I}) $$ This $x_T$ represents the state after the maximum number of noising steps in the forward process, essentially pure, unstructured noise. Our goal is to iteratively denoise this $x_T$ back through time, step by step, until we reach a clean sample $x_0$.The core idea is to use the trained noise prediction network $\epsilon_\theta$ at each step $t$ (from $T$ down to 1) to estimate what the slightly less noisy sample $x_{t-1}$ should look like, given the current noisy sample $x_t$.Imagine we are at timestep $t$ with a sample $x_t$. Our model $\epsilon_\theta(x_t, t)$ provides an estimate of the noise component within $x_t$. We can use this estimate to take a step "backwards" towards $x_{t-1}$. The specific mathematical operation depends on the chosen sampling algorithm (like DDPM or DDIM, which we'll detail next), but the fundamental principle is the same: use the predicted noise to guide the transition from $x_t$ to an approximation of $x_{t-1}$.This process is repeated iteratively:Start with $x_T \sim \mathcal{N}(0, \mathbf{I})$.For $t = T, T-1, \dots, 1$:Predict the noise in $x_t$ using the model: $\hat{\epsilon}t = \epsilon\theta(x_t, t)$.Use $\hat{\epsilon}t$ and $x_t$ to calculate an estimate for $x{t-1}$. This step typically involves the variance schedule ($\beta_t$ or $\alpha_t$) defined during the forward process and may add a small amount of controlled noise back in, depending on the sampler.The final output $x_0$ is the generated sample.Each reverse step $p_\theta(x_{t-1} | x_t)$ refines the sample, gradually transforming the initial unstructured noise into something that resembles the data distribution the model learned during training. If trained on images of faces, $x_0$ should look like a face. If trained on images of cats, $x_0$ should resemble a cat.The following diagram illustrates this iterative denoising flow:digraph G { rankdir=RL; node [shape=box, style=rounded, fontname="Arial", fontsize=10, margin=0.2]; edge [fontname="Arial", fontsize=9]; splines=ortho; newrank=true; subgraph cluster_gen { label="Generation Process (Reverse Diffusion)"; color="#adb5bd"; fontname="Arial"; fontsize=11; style=dashed; T -> T_1 [label=" Use model \n $\\epsilon_\\theta(x_T, T)$ \n to estimate $x_{T-1}$ "]; T_1 -> T_2 [label=" Use model \n $\\epsilon_\\theta(x_{T-1}, T-1)$ \n to estimate $x_{T-2}$ "]; T_2 -> dots [label=" ... repeat ... "]; dots -> t1 [label=" Use model \n $\\epsilon_\\theta(x_1, 1)$ \n to estimate $x_0$ "]; T [label=" $x_T$ \n (Pure Noise) \n $\\sim \\mathcal{N}(0, \\mathbf{I})$ ", shape=ellipse, style=filled, fillcolor="#a5d8ff"]; T_1 [label="$x_{T-1}$"]; T_2 [label="$x_{T-2}$"]; dots [label="...", shape=plaintext]; t1 [label="$x_1$"]; t0 [label=" $x_0$ \n (Generated Sample) ", shape=ellipse, style=filled, fillcolor="#96f2d7"]; t1 -> t0; } }The generation process starts with random noise $x_T$ and iteratively applies the learned denoising function $\epsilon_\theta$ at each timestep $t$ to produce progressively cleaner samples, culminating in the final output $x_0$.This overall flow provides the foundation for generating data. The next sections will detail the specific algorithms, starting with DDPM, that define exactly how the transition from $x_t$ to $x_{t-1}$ is calculated using the predicted noise $\epsilon_\theta(x_t, t)$.