We've established that the forward diffusion process gradually adds noise to data over many timesteps, forming a Markov chain. Let's now look at the precise mathematical definition of a single step in this chain. How do we get from a state xt−1 at timestep t−1 to the next state xt at timestep t?
The transition is defined by the conditional probability distribution q(xt∣xt−1). In standard diffusion models like Denoising Diffusion Probabilistic Models (DDPM), this transition is modeled as adding a small amount of Gaussian noise. The amount of noise added at each step is controlled by a predetermined variance schedule, denoted by βt, where t ranges from 1 to T (the total number of diffusion steps).
Specifically, the distribution q(xt∣xt−1) is defined as a Gaussian distribution whose mean depends on the previous state xt−1 and whose variance is given by βt:
q(xt∣xt−1)=N(xt;1−βtxt−1,βtI)Let's break down this equation:
This formula tells us that xt is centered around a slightly scaled-down version of xt−1 (scaled by 1−βt), with added noise controlled by βt. Because βt is small, 1−βt is slightly less than 1, so the signal from xt−1 is mostly preserved, but noise is introduced.
For convenience, it's common to define αt=1−βt. Since βt is small and positive, αt is slightly less than 1. Using this notation, the equation becomes:
q(xt∣xt−1)=N(xt;αtxt−1,(1−αt)I)This form highlights that the new state xt is a combination of the scaled previous state αtxt−1 and newly added noise with variance 1−αt=βt.
We can express the process of sampling xt from xt−1 using the reparameterization trick. If ϵt−1 is a random variable drawn from a standard Gaussian distribution N(0,I), then we can write xt as:
xt=αtxt−1+1−αtϵt−1Here, ϵt−1∼N(0,I). This formulation is particularly useful for implementation, as it clearly separates the deterministic part (scaling xt−1) and the stochastic part (adding scaled standard Gaussian noise).
The sequence of values β1,β2,...,βT (or equivalently α1,α2,...,αT) constitutes the noise schedule. The choice of this schedule is an important design decision that affects the diffusion process and model performance. We will examine different scheduling strategies in the next section.
For now, the important takeaway is this single-step transition formula. It's the fundamental building block of the entire forward diffusion process, defining precisely how noise is incrementally added at each step of the Markov chain. Understanding this equation is necessary for grasping both the forward process properties and how the reverse (denoising) process is formulated later.
Let's visualize this for a single data point (1-dimensional) transitioning from xt−1 to xt.
Diagram illustrating a single step xt−1→xt. The point xt−1 is scaled down to αtxt−1, and then Gaussian noise with variance βt=1−αt is added, resulting in the new state xt.
This step-by-step addition of controlled noise ensures that if we repeat this process for T steps, the resulting xT will closely resemble pure noise, effectively destroying the original data structure. The next section will discuss how the βt values are chosen in the noise schedule, and later sections will show how we can derive a formula to jump directly from x0 to any xt.
© 2025 ApX Machine Learning