The forward diffusion process is a systematic way of corrupting data by adding Gaussian noise iteratively over timesteps.
At the end of this process, is essentially indistinguishable from pure Gaussian noise. Our objective now is generative modeling: we want to create new data samples that look like they came from the original data distribution . To achieve this, we need to figure out how to reverse the noising process.
Imagine starting with a sample drawn from a simple distribution, like the standard Gaussian . If we could somehow reverse each step of the forward process, moving backward in time from down to , we could potentially transform this initial noise sample into a realistic data sample :
This reversal defines the generative pathway of the diffusion model.
Diagram illustrating the forward (noising) and reverse (generative) processes as Markov chains moving in opposite directions.
The forward process is defined by the transition probability , which specifies how to get from to by adding a controlled amount of noise. The core goal of the reverse process is to learn the opposite transition: the probability distribution . This distribution tells us, given a noisy sample at timestep , what the distribution over possible "less noisy" samples at the previous timestep looks like.
If we can successfully model this reverse transition probability for all relevant timesteps (from down to 1), we can implement the generation procedure:
Therefore, the central challenge in building a diffusion model is to effectively estimate or parameterize these reverse conditional probabilities . The forward process was designed to be mathematically convenient (adding Gaussian noise). As we will see in the following sections, the true reverse transition (note the conditioning on ) is known, but calculating the desired requires knowing the entire data distribution, which is exactly what we are trying to learn. This intractability motivates the use of powerful function approximators, specifically neural networks, to learn a model that approximates the true reverse transitions.
Our goal is set: learn a model that can predict the previous state given the current state , enabling us to walk backward along the chain from noise to data. The next sections will detail how we formulate and train a neural network to perform this task.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with