As outlined earlier, the core computational load during diffusion model inference arises from the iterative sampling process. Each step typically involves a forward pass through the large U-Net model, leading to significant latency, especially when hundreds of steps (T) are required for high-quality generation. A primary optimization target, therefore, is to reduce the Number of Function Evaluations (NFE), which corresponds directly to the number of sampling steps, without significantly degrading the quality of the generated output. This reduction translates directly into lower latency, higher throughput, and reduced computational cost per generation.
The sampling process in diffusion models involves solving a reverse-time stochastic differential equation (SDE) or an ordinary differential equation (ODE). Traditional samplers like the one used in Denoising Diffusion Probabilistic Models (DDPM) often require many steps (T≈1000) because they simulate a stochastic process with small increments. Early efforts focused on developing solvers that could achieve comparable quality with significantly fewer steps.
A significant advancement came with Denoising Diffusion Implicit Models (DDIM). DDIM reformulates the generative process as solving an ODE, which results in a deterministic sampling path given a starting noise vector xT. This determinism is key because it allows for larger step sizes compared to stochastic samplers. DDIM introduces a parameter η that controls the stochasticity; setting η=0 yields the deterministic ODE variant.
The core idea behind DDIM's faster sampling is that it defines a non-Markovian forward process, which allows skipping steps during the reverse inference process. While the original DDPM step depends only on xt, the DDIM sampler can effectively jump, for instance, from t=500 to t=400 directly, approximating the path segment with a single computation. This allows reducing the NFE from ≈1000 down to ≈50 or even ≈20 steps, albeit often with some trade-off in sample diversity or fine detail compared to the full DDPM process.
Diagram illustrating the difference in step count between a traditional stochastic sampler (DDPM) and a faster deterministic sampler (DDIM). DDIM takes larger jumps, reducing the total NFE.
DDIM is essentially a first-order ODE solver. Just as in classical numerical methods, we can employ higher-order solvers to potentially achieve better accuracy per step, allowing for even fewer steps overall for a target quality level. Several methods adapted from numerical integration techniques have been applied successfully to diffusion models:
Pseudo Numerical Methods for Diffusion Models (PNDM): PNDM uses a linear multi-step method, leveraging information from previous steps (e.g., xt+1, xt+2) to improve the estimate of xt. This can lead to faster convergence compared to DDIM.
DPM-Solver (Diffusion Probabilistic Model Solver): DPM-Solver and its variants (like DPM-Solver++) are specifically designed high-order solvers for diffusion ODEs. They work by analytically solving the ODE within intermediate steps, leading to highly accurate solutions even with very few NFEs (<20). DPM-Solver++ further improves stability and accuracy, often achieving excellent results in 10-15 steps. The core idea is to use the predicted derivative (noise estimate) at multiple points within a step interval to construct a more accurate polynomial approximation of the solution path, allowing for larger, more accurate steps.
Using these higher-order solvers can dramatically reduce inference time. For example, generating an image might take 5-10 seconds with DDIM at 50 steps, but only 1-2 seconds with DPM-Solver++ at 15 steps on the same hardware, assuming the NFE is the dominant factor.
Most samplers use a predefined, fixed schedule for the timesteps t they evaluate (e.g., linearly spaced, quadratically spaced). However, the rate of change in the data manifold might not be uniform across the diffusion process. Adaptive step size solvers attempt to adjust the step size Δt dynamically based on estimated local error or trajectory curvature. While potentially more complex to implement and tune, they offer the promise of allocating computational effort more effectively, using smaller steps only when necessary.
Furthermore, the choice of timesteps (the "schedule") can impact quality for a fixed NFE. Research continues into optimal scheduling strategies, sometimes finding that concentrating steps in specific parts of the diffusion process (e.g., middle or later stages) yields better results for certain models or datasets.
Choosing an optimized sampler involves balancing speed (NFE) against generation quality. Quality is often measured using metrics like Fréchet Inception Distance (FID), which compares the distribution of generated images to real images, or CLIP Score, which measures alignment between generated images and text prompts.
It's essential to benchmark different samplers and step counts for your specific model and target application. A common approach is to plot FID or another quality metric against NFE or wall-clock inference time.
Illustrative comparison of different samplers showing FID (lower is better) and Inference Time against the Number of Function Evaluations (NFE). Note how higher-order solvers like DPM-Solver++ can achieve lower FID at significantly fewer steps (and thus lower time) compared to DDIM. Actual values depend heavily on the model, hardware, and implementation.
Modern libraries like Hugging Face's diffusers
provide easy access to various pre-implemented samplers (schedulers in their terminology). Swapping samplers often requires changing only a few lines of code when initializing the inference pipeline.
# Example using Hugging Face diffusers (illustrative)
from diffusers import DiffusionPipeline, DDIMScheduler, DPMSolverMultistepScheduler
pipeline = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0")
# Use DDIM with 50 steps
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
images_ddim = pipeline(prompt="An astronaut riding a horse", num_inference_steps=50).images
# Use DPM-Solver++ with 20 steps
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)
images_dpm = pipeline(prompt="An astronaut riding a horse", num_inference_steps=20).images
When deploying diffusion models, selecting the right sampler and number of steps is a critical optimization lever. It directly impacts user-perceived latency and the overall cost of running the inference service. Thorough benchmarking is necessary to determine the optimal operating point that meets your application's quality requirements while minimizing NFE. This often involves testing several candidate samplers (e.g., DDIM, DPM-Solver++) across a range of step counts (e.g., 10, 15, 20, 30, 50) and evaluating the resulting images both quantitatively (FID, CLIP) and qualitatively.
© 2025 ApX Machine Learning