DDPM and DDIM sampling theory describes a reverse process that iteratively denoises, starting from pure noise $x_T$. Implementation involves loops that perform this denoising step by step.We assume you have a trained noise prediction model, often a U-Net, which we'll refer to as model. This model takes a noisy input $x_t$ and the timestep $t$ and outputs the predicted noise $\epsilon_\theta(x_t, t)$. We also assume you have precomputed the noise schedule variables: $\beta_t$, $\alpha_t = 1 - \beta_t$, and $\bar{\alpha}t = \prod{i=1}^t \alpha_i$. These are typically stored in tensors.Implementing the DDPM Sampling LoopThe DDPM sampling process follows the reverse Markov chain defined in Chapter 3. Starting with $x_T \sim \mathcal{N}(0, \mathbf{I})$, we iteratively sample $x_{t-1}$ from $p_\theta(x_{t-1} | x_t)$ for $t = T, T-1, ..., 1$.Recall the core equation for the denoising step: $$ x_{t-1} = \frac{1}{\sqrt{\alpha_t}} \left( x_t - \frac{1 - \alpha_t}{\sqrt{1 - \bar{\alpha}t}} \epsilon\theta(x_t, t) \right) + \sigma_t z $$ where $z \sim \mathcal{N}(0, \mathbf{I})$ is fresh noise added at each step, and $\sigma_t^2$ is the variance of the reverse transition, often set to $\beta_t$ or $\tilde{\beta}t = \frac{1 - \bar{\alpha}{t-1}}{1 - \bar{\alpha}_t} \beta_t$. Using $\tilde{\beta}_t$ is common.Here's a Python-like implementation structure using PyTorch conventions:import torch def ddpm_sampler(model, n_steps, shape, device, betas, alphas, alphas_cumprod): """ Generates samples using the DDPM algorithm. Args: model: The trained noise prediction model (U-Net). n_steps (int): Total number of diffusion steps (T). shape: The shape of the desired output tensor (e.g., [batch_size, channels, height, width]). device: The device to perform computations on (e.g., 'cuda' or 'cpu'). betas (torch.Tensor): Tensor of beta values for the noise schedule. alphas (torch.Tensor): Tensor of alpha values (1 - beta). alphas_cumprod (torch.Tensor): Tensor of cumulative products of alpha values. Returns: torch.Tensor: The generated samples (x_0). """ # 1. Start with random noise x_T xt = torch.randn(shape, device=device) # Precompute required schedule variables sqrt_alphas = torch.sqrt(alphas).to(device) sqrt_one_minus_alphas_cumprod = torch.sqrt(1.0 - alphas_cumprod).to(device) # Calculate posterior variance (using \tilde{\beta}_t) alphas_cumprod_prev = torch.cat([torch.tensor([1.0], device=device), alphas_cumprod[:-1]]) posterior_variance = betas * (1.0 - alphas_cumprod_prev) / (1.0 - alphas_cumprod) # Avoid division by zero at t=0, although we stop at t=1 posterior_variance[0] = betas[0] * (1.0 - 1.0) / (1.0 - alphas_cumprod[0]) if alphas_cumprod[0] != 1 else torch.tensor(0.0) sqrt_posterior_variance = torch.sqrt(posterior_variance).to(device) # 2. Iteratively denoise for t = T, T-1, ..., 1 for t in reversed(range(n_steps)): # Prepare timestep tensor for the model # Model usually expects shape [batch_size] time_tensor = torch.full((shape[0],), t, dtype=torch.long, device=device) # Predict the noise using the model with torch.no_grad(): # No need to track gradients during sampling predicted_noise = model(xt, time_tensor) # Calculate the mean of the reverse distribution p(x_{t-1} | x_t) alpha_t = alphas[t] # Get scalar alpha_t sqrt_alpha_t = sqrt_alphas[t] # Get scalar sqrt_alpha_t sqrt_one_minus_alpha_cumprod_t = sqrt_one_minus_alphas_cumprod[t] # Get scalar term # Equation for the mean: (1/sqrt(alpha_t)) * (xt - ( (1-alpha_t) / sqrt(1-alpha_bar_t) ) * eps_theta) mean = (1 / sqrt_alpha_t) * (xt - ((1 - alpha_t) / sqrt_one_minus_alpha_cumprod_t) * predicted_noise) # Get the variance term \sigma_t variance = sqrt_posterior_variance[t] # Sample z from N(0, I) if t > 0 z = torch.randn_like(xt) if t > 0 else torch.zeros_like(xt) # No noise added at the last step (t=0) # Calculate x_{t-1} xt = mean + variance * z # Note: variance already includes sqrt # Optional: Clamp pixel values if generating images, e.g., xt.clamp_(-1., 1.) # 3. Return the final denoised sample x_0 return xt # --- Usage Example (assuming model and schedule variables are defined) --- # T = 1000 # image_shape = [1, 3, 64, 64] # Batch size 1, 3 channels, 64x64 pixels # device = 'cuda' if torch.cuda.is_available() else 'cpu' # generated_image = ddpm_sampler(unet_model, T, image_shape, device, betas_tensor, alphas_tensor, alphas_cumprod_tensor)Important aspects of this implementation:Initialization: Start with pure Gaussian noise xt.Loop: Iterate backwards from n_steps - 1 down to 0.Model Prediction: Call the model to get predicted_noise. Remember torch.no_grad() for efficiency.Mean Calculation: Implement the formula for the mean of $p_\theta(x_{t-1} | x_t)$. Be careful with tensor shapes and indexing.Noise Addition: Sample z and add it scaled by the variance term ($\sigma_t$). Note that no noise is added at the very last step when predicting $x_0$ from $x_1$.Update: Update xt to the newly calculated x_{t-1} for the next iteration.Implementing the DDIM Sampling LoopDDIM provides a faster sampling alternative by defining a non-Markovian generation process. It allows us to skip steps and uses a parameter $\eta$ (eta) to control the trade-off between determinism ($\eta=0$) and stochasticity ($\eta=1$). When $\eta=1$, it approximates DDPM. When $\eta=0$, the process becomes deterministic given $x_t$.The DDIM update rule is: $$ x_{t-1} = \sqrt{\bar{\alpha}{t-1}} \hat{x}0 + \sqrt{1 - \bar{\alpha}{t-1} - \sigma_t^2} \cdot \epsilon\theta(x_t, t) + \sigma_t z $$ where $\hat{x}_0$ is the predicted clean sample: $$ \hat{x}_0 = \frac{x_t - \sqrt{1 - \bar{\alpha}t} \epsilon\theta(x_t, t)}{\sqrt{\bar{\alpha}_t}} $$ and the standard deviation $\sigma_t$ is controlled by $\eta$: $$ \sigma_t^2 = \eta \tilde{\beta}t = \eta \frac{1 - \bar{\alpha}{t-1}}{1 - \bar{\alpha}_t} (1 - \frac{\bar{\alpha}t}{\bar{\alpha}{t-1}}) $$Notice that if $\eta=0$, then $\sigma_t = 0$, and the last term vanishes, making the process deterministic. DDIM also allows using a subsequence of timesteps (e.g., skipping every 10 steps) for faster generation.Here's a Python-like implementation structure for DDIM:import torch import numpy as np def ddim_sampler(model, n_inference_steps, shape, device, alphas_cumprod, eta=0.0): """ Generates samples using the DDIM algorithm. Args: model: The trained noise prediction model (U-Net). n_inference_steps (int): Number of denoising steps during inference (can be less than T). shape: The shape of the desired output tensor. device: The device to perform computations on. alphas_cumprod (torch.Tensor): Tensor of cumulative products of alpha values from the full schedule (T steps). eta (float): Controls the stochasticity (0.0 = deterministic, 1.0 = DDPM-like). Returns: torch.Tensor: The generated samples (x_0). """ # 1. Determine the timesteps to use for inference n_train_steps = len(alphas_cumprod) # Example: Use n_inference_steps equally spaced timesteps from [0, T-1] # More sophisticated spacing strategies might exist. inference_timesteps = np.linspace(0, n_train_steps - 1, n_inference_steps, dtype=int) inference_timesteps_tensor = torch.from_numpy(inference_timesteps).long().to(device) # 2. Start with random noise x_T (corresponding to the last timestep in our inference sequence) xt = torch.randn(shape, device=device) # Precompute required schedule variables on device sqrt_alphas_cumprod = torch.sqrt(alphas_cumprod).to(device) sqrt_one_minus_alphas_cumprod = torch.sqrt(1.0 - alphas_cumprod).to(device) # 3. Iteratively denoise using the inference timesteps for i, t in enumerate(reversed(inference_timesteps_tensor)): # Get the previous timestep index, handle boundary case t=0 t_prev_idx = max(0, len(inference_timesteps_tensor) - 1 - (i + 1)) t_prev = inference_timesteps_tensor[t_prev_idx] if i < len(inference_timesteps_tensor) - 1 else torch.tensor(-1, device=device) # -1 indicates the step to x_0 # Prepare timestep tensor for the model time_tensor = torch.full((shape[0],), t, dtype=torch.long, device=device) # Predict the noise using the model with torch.no_grad(): predicted_noise = model(xt, time_tensor) # Get alpha_cumprod terms for current and previous timesteps alpha_cumprod_t = alphas_cumprod[t] alpha_cumprod_t_prev = alphas_cumprod[t_prev] if t_prev >= 0 else torch.tensor(1.0, device=device) # alpha_cumprod for t=0 is 1 # Calculate the predicted original sample (x_0_hat) # Formula: (xt - sqrt(1 - alpha_cumprod_t) * predicted_noise) / sqrt(alpha_cumprod_t) sqrt_alpha_cumprod_t = sqrt_alphas_cumprod[t] sqrt_one_minus_alpha_cumprod_t = sqrt_one_minus_alphas_cumprod[t] x0_pred = (xt - sqrt_one_minus_alpha_cumprod_t * predicted_noise) / sqrt_alpha_cumprod_t # Calculate coefficients for xt-1 # Direction pointing to x_t dir_xt = torch.sqrt(1.0 - alpha_cumprod_t_prev - (eta**2 * (1.0 - alpha_cumprod_t_prev) / (1.0 - alpha_cumprod_t) * (1.0 - alpha_cumprod_t / alpha_cumprod_t_prev))) * predicted_noise # simplified: sqrt(1.0 - alpha_cumprod_t_prev - sigma_t^2) * predicted_noise # Calculate sigma_t variance = 0.0 if eta > 0: # Calculate sigma_t^2 = eta^2 * \tilde{beta}_t beta_t = 1.0 - (alpha_cumprod_t / alpha_cumprod_t_prev) if t_prev >= 0 and alpha_cumprod_t_prev != 0 else 0.0 # Approximated beta for the interval term1 = (1.0 - alpha_cumprod_t_prev) / (1.0 - alpha_cumprod_t) if (1.0 - alpha_cumprod_t) != 0 else 0.0 variance = eta * torch.sqrt(term1 * beta_t) # Sample random noise z z = torch.randn_like(xt) if eta > 0 else torch.zeros_like(xt) # Calculate x_{t-1} # Formula: sqrt(alpha_cumprod_t_prev) * x0_pred + dir_xt + variance * z sqrt_alpha_cumprod_t_prev = torch.sqrt(alpha_cumprod_t_prev) xt = sqrt_alpha_cumprod_t_prev * x0_pred + dir_xt + variance * z # Optional: Clamp # xt.clamp_(-1., 1.) # 4. Return the final sample x_0 return xt # --- Usage Example --- # inference_steps = 50 # Much fewer than T=1000 # eta_value = 0.0 # Deterministic sampling # generated_image_ddim = ddim_sampler(unet_model, inference_steps, image_shape, device, alphas_cumprod_tensor, eta=eta_value)Main differences in the DDIM implementation:Timestep Selection: We choose a subset of n_inference_steps from the original n_train_steps.eta Parameter: Controls the amount of noise added. eta=0 makes the process deterministic.Update Rule: The calculation of $x_{t-1}$ is different, explicitly calculating the predicted $x_0$ first.Skipping Steps: The loop iterates over the chosen inference_timesteps, potentially taking much larger jumps than the single steps in DDPM.digraph G { rankdir=LR; node [shape=circle, style=filled, fillcolor="#a5d8ff", fontname="sans-serif", fontsize=10]; edge [fontname="sans-serif", fontsize=9]; subgraph cluster_ddpm { label = "DDPM (T steps)"; bgcolor="#e9ecef"; style=rounded; T_ddpm [label="T"]; T1_ddpm [label="T-1"]; T2_ddpm [label="T-2"]; T3_ddpm [label="..."]; step1_ddpm [label="1"]; step0_ddpm [label="0"]; T_ddpm -> T1_ddpm -> T2_ddpm -> T3_ddpm -> step1_ddpm -> step0_ddpm [label=" denoise"]; } subgraph cluster_ddim { label = "DDIM (k < T steps)"; bgcolor="#e9ecef"; style=rounded; node [fillcolor="#96f2d7"]; T_ddim [label="T"]; Tk1_ddim [label="t_k-1"]; Tk2_ddim [label="..."]; t1_ddim [label="t_1"]; step0_ddim [label="0"]; T_ddim -> Tk1_ddim [label=" denoise (skip)"]; Tk1_ddim -> Tk2_ddim [label=" denoise (skip)"]; Tk2_ddim -> t1_ddim [label=" denoise (skip)"]; t1_ddim -> step0_ddim [label=" denoise"]; } }Comparison of sampling step sequences for DDPM and DDIM. DDIM often uses fewer, larger steps.These code structures provide a foundation. You'll need to integrate them with your specific model loading, data handling, and the precomputed noise schedule corresponding to your model's training. Experimenting with n_inference_steps and eta in DDIM allows you to explore the speed vs. quality trade-offs discussed previously.