As we established, directly computing the true reverse probability is generally intractable. Our strategy is to approximate this reverse transition using a parameterized distribution, typically a Gaussian, learned by a neural network: . The network needs to estimate the parameters of this Gaussian distribution, specifically its mean and variance .
While the network could be trained to directly predict the mean or even the denoised sample , a different approach has proven remarkably effective in practice: predicting the noise component that was added during the forward process at timestep .
Let's see why this is useful. Recall the closed-form expression for sampling directly from in the forward process:
Here, is a standard Gaussian noise sample, and is derived from the noise schedule. This equation links the original data , the noisy version , and the noise added to reach that state.
If our neural network, let's call it , can accurately predict the noise based on the noisy input and the timestep , we can use this prediction, , to inform our estimate of the previous state .
How does predicting noise help estimate the mean of the reverse step ? The original Denoising Diffusion Probabilistic Models (DDPM) paper showed that the mean of the reverse transition can be expressed as:
where . Notice this expression depends on the original data (through ) which we don't have during generation.
However, we can rearrange the first equation to get an estimate of if we know and :
By substituting our network's noise prediction for in the equation for , we arrive at an expression for the mean of our approximate reverse transition :
This establishes a direct link: if our network successfully predicts the noise added at step , we can calculate the mean required for the denoising step . The variance is often fixed to a value related to the forward process variances, or sometimes also learned, but predicting the noise is primarily used to determine the mean.
Parameterizing the reverse process by predicting noise offers several advantages:
Therefore, the standard approach involves training a neural network that takes the noisy data and the timestep as input and outputs a prediction of the noise that was used to generate from . This predicted noise then allows us to compute the parameters (specifically the mean) of the approximate reverse distribution , enabling the step-by-step generation process.
The neural network takes the current noisy sample and the timestep as input. Its goal is to predict the noise that was likely added to the original data to produce . This prediction is the core component used to guide the reverse denoising step.
Was this section helpful?
© 2026 ApX Machine LearningAI Ethics & Transparency•