In the previous chapter, we explored the forward diffusion process, a systematic way of corrupting data x0 by adding Gaussian noise iteratively over T timesteps:
x0→x1→x2→⋯→xTAt the end of this process, xT is essentially indistinguishable from pure Gaussian noise. Our objective now is generative modeling: we want to create new data samples that look like they came from the original data distribution q(x0). To achieve this, we need to figure out how to reverse the noising process.
Imagine starting with a sample xT drawn from a simple distribution, like the standard Gaussian N(0,I). If we could somehow reverse each step of the forward process, moving backward in time from t=T down to t=1, we could potentially transform this initial noise sample xT into a realistic data sample x0:
xT→xT−1→xT−2→⋯→x0This reversal defines the generative pathway of the diffusion model.
Diagram illustrating the forward (noising) and reverse (generative) processes as Markov chains moving in opposite directions.
The forward process is defined by the transition probability q(xt∣xt−1), which specifies how to get from xt−1 to xt by adding a controlled amount of noise. The core goal of the reverse process is to learn the opposite transition: the probability distribution p(xt−1∣xt). This distribution tells us, given a noisy sample xt at timestep t, what the distribution over possible "less noisy" samples xt−1 at the previous timestep looks like.
If we can successfully model this reverse transition probability p(xt−1∣xt) for all relevant timesteps t (from T down to 1), we can implement the generation procedure:
Therefore, the central challenge in building a diffusion model is to effectively estimate or parameterize these reverse conditional probabilities p(xt−1∣xt). The forward process q(xt∣xt−1) was designed to be mathematically convenient (adding Gaussian noise). As we will see in the following sections, the true reverse transition q(xt−1∣xt,x0) (note the conditioning on x0) is known, but calculating the desired q(xt−1∣xt) requires knowing the entire data distribution, which is exactly what we are trying to learn. This intractability motivates the use of powerful function approximators, specifically neural networks, to learn a model pθ(xt−1∣xt) that approximates the true reverse transitions.
Our goal is set: learn a model that can predict the previous state xt−1 given the current state xt, enabling us to walk backward along the chain from noise to data. The next sections will detail how we formulate and train a neural network to perform this task.
© 2025 ApX Machine Learning