Once a consistency model $f_\theta(x_t, t)$ is trained, either through distillation or standalone training, it provides a direct mapping from any point $(x_t, t)$ on a probability flow ODE trajectory to an estimate of the trajectory's origin, $\hat{x}_0$. This capability allows for significantly accelerated sampling compared to traditional diffusion models that require simulating the reverse process iteratively. We will now examine how to generate samples using both single-step and multi-step approaches.Single-Step GenerationThe most direct application of a consistency model is single-step generation. This method aims to produce a sample in a single forward pass of the model, offering maximum inference speed.The procedure is straightforward:Sample an initial noise vector $x_T$ from the prior distribution, typically a standard Gaussian: $x_T \sim \mathcal{N}(0, I)$. Here, $T$ represents the maximum noise level or time in the diffusion process context.Evaluate the consistency model $f_\theta$ using this noise vector and the corresponding maximum time $T$: $$ \hat{x}0 = f\theta(x_T, T) $$The output $\hat{x}_0$ is the generated sample.This single evaluation replaces the hundreds or thousands of steps often required by DDPM or DDIM samplers. The underlying principle relies on the trained model's ability to approximate the function that maps any point $(x_t, t)$ directly to the starting point $x_0$ of its ODE trajectory.Advantages:Extreme Speed: Single-step generation is the fastest possible method, requiring only one network evaluation per sample.Simplicity: The sampling algorithm is trivial to implement.Disadvantages:Potential Quality Trade-off: The quality of samples generated in a single step might be lower than those produced by multi-step consistency sampling or traditional diffusion models, especially if the consistency property is not perfectly learned across all timesteps. Errors in the single mapping from $x_T$ to $\hat{x}_0$ are not corrected.Single-step sampling is particularly useful in applications where generation speed is the primary concern, such as real-time interactive systems, even if it means a slight compromise on the highest possible fidelity.Multi-Step GenerationTo improve sample quality while still maintaining significant speed advantages over traditional methods, consistency models can be used in a multi-step generation process. This approach uses the consistency property iteratively, adding intermediate denoising and noise injection steps to refine the estimate.The multi-step algorithm typically involves the following:Choose the number of sampling steps, $N$ (e.g., 2, 5, 10), which is significantly smaller than typical diffusion sampling steps.Define a sequence of timesteps $T = t_N > t_{N-1} > \dots > t_1 > t_0 = 0$. A common choice is uniform spacing on some scale (e.g., linear or log-linear). Let $\epsilon$ be a small positive constant representing the minimum noise level (e.g., $\epsilon=0.002$).Sample the initial noise vector: $x_{t_N} \sim \mathcal{N}(0, I)$.Iterate from $i = N$ down to $1$: a. Get the current state $x_{t_i}$ and time $t_i$. b. Estimate the origin using the consistency model: $\hat{x}0^{(i)} = f\theta(x_{t_i}, t_i)$. c. If $i > 1$: i. Determine the next timestep $t_{i-1}$. ii. Sample Gaussian noise: $z_i \sim \mathcal{N}(0, I)$. iii. Add noise to the estimated origin to obtain the state at the next timestep: $$ x_{t_{i-1}} = \hat{x}0^{(i)} + \sqrt{t{i-1}^2 - \epsilon^2} \cdot z_i $$ This step uses the estimate $\hat{x}0^{(i)}$ and injects the appropriate amount of noise to simulate $x{t_{i-1}}$ lying on the same trajectory. The $\epsilon^2$ term ensures the noise variance is positive. d. If $i = 1$: The final estimate $\hat{x}_0^{(1)}$ is the result.Return the final estimate $\hat{x}_0 = \hat{x}_0^{(1)}$.This process uses the consistency model at each step $i$ to get an estimate $\hat{x}0^{(i)}$ of the origin from the current state $x{t_i}$. It then 'jumps' to the next time step $t_{i-1}$ by adding appropriately scaled noise back to this estimate, effectively correcting the path towards the origin at multiple points along the trajectory.Advantages:Improved Sample Quality: Compared to single-step generation, the iterative refinement generally leads to higher fidelity samples that more closely match the quality of the original diffusion model (if distilled) or the target distribution.Flexible Trade-off: The number of steps $N$ allows tuning the balance between computational cost and sample quality. Even with a small $N$ (e.g., 2-10), significant quality improvements over single-step are often observed.Disadvantages:Increased Computation: Requires $N$ network evaluations, making it slower than single-step generation, though still much faster than traditional diffusion samplers.The following diagram illustrates the difference between single-step and multi-step sampling paths:digraph G { rankdir=LR; node [shape=point, width=0.1, height=0.1]; edge [arrowhead=vee, arrowsize=0.7]; splines=true; subgraph cluster_single { label = "Single-Step Sampling"; style=dashed; bgcolor="#e9ecef"; color="#adb5bd"; node [color="#1c7ed6"]; edge [color="#1c7ed6"]; xT_s [label="", shape=circle, style=filled, fillcolor="#fa5252", xlabel="x_T"]; x0_s_hat [label="", shape=star, style=filled, fillcolor="#40c057", xlabel="x_0"]; xT_s -> x0_s_hat [label=" f_θ(x_T, T)", fontsize=10, fontcolor="#495057"]; } subgraph cluster_multi { label = "Multi-Step Sampling (N=3)"; style=dashed; bgcolor="#e9ecef"; color="#adb5bd"; node [color="#7048e8"]; edge [color="#7048e8"]; xT_m [label="", shape=circle, style=filled, fillcolor="#fa5252", xlabel="x_T = x_t3"]; xt2 [label="", shape=circle, xlabel="x_t2"]; xt1 [label="", shape=circle, xlabel="x_t1"]; x0_m_hat [label="", shape=star, style=filled, fillcolor="#40c057", xlabel="x_0"]; // Invisible nodes for curve control c1 [pos="2,0.5!", shape=none, label=""]; c2 [pos="4,0.1!", shape=none, label=""]; c3 [pos="6,-0.3!", shape=none, label=""]; xT_m -> xt2 [label=" Step 1 (Estimate x0, Add noise)", fontsize=9, fontcolor="#495057", style=dashed]; xt2 -> xt1 [label=" Step 2 (Estimate x0, Add noise)", fontsize=9, fontcolor="#495057", style=dashed]; xt1 -> x0_m_hat [label=" Step 3 (Estimate x0)", fontsize=9, fontcolor="#495057", style=dashed]; // Actual estimate points (invisible for clarity, used for labels) x0_hat_3 [pos="2,0.5!", shape=none, label=""]; x0_hat_2 [pos="4,0.1!", shape=none, label=""]; } // Add invisible edges to force layout roughly xT_s -> xT_m [style=invis]; x0_s_hat -> x0_m_hat [style=invis]; }Single-step sampling directly maps initial noise $x_T$ to the estimate $\hat{x}0$. Multi-step sampling follows an iterative path, using intermediate estimates and noise injection steps ($x{t3} \rightarrow x_{t2} \rightarrow x_{t1}$) to reach the final $\hat{x}_0$.Practical NotesTimestep Schedule: The choice of the sequence $t_N, \dots, t_1$ in multi-step sampling can impact performance. Schedules that place more steps at lower noise levels might be beneficial, similar to findings in accelerated diffusion sampling.Noise Term: The $\sqrt{t_{i-1}^2 - \epsilon^2}$ term is important for correctly scaling the injected noise. Ensure $t_1 > \epsilon$. The exact form might vary slightly depending on the specific noise schedule parameterization used during training.Boundary Condition $\epsilon$: This small value prevents issues when $t_{i-1}$ approaches zero and ensures the noise standard deviation remains real. Its exact value is usually not highly sensitive but should be small and positive.By choosing between single-step and multi-step generation, you can balance the need for inference speed with the desired level of sample quality, making consistency models a versatile tool for efficient generative modeling.