In the previous sections, we established the forward diffusion process as a step-by-step Markov chain where a small amount of Gaussian noise is added at each timestep t, governed by the variance schedule βt. The transition is defined as q(xt∣xt−1). While this defines the process, simulating it step-by-step to get a noisy sample xt from an initial data point x0 can be computationally expensive, especially for large T. Fortunately, there's a more direct way.
A significant property of this specific noising process is that we can derive a closed-form equation to sample xt directly from x0 for any timestep t, without needing to compute all the intermediate states x1,x2,...,xt−1. This is extremely useful, particularly during the training phase of the diffusion model.
Let's derive this relationship. Recall the single-step transition:
xt=1−βtxt−1+βtϵt−1whereϵt−1∼N(0,I)For convenience, let's define αt=1−βt. The equation becomes:
xt=αtxt−1+1−αtϵt−1Now, we can recursively expand this expression. Let's look at xt−1 in terms of xt−2:
xt−1=αt−1xt−2+1−αt−1ϵt−2Substituting this into the equation for xt:
xt=αt(αt−1xt−2+1−αt−1ϵt−2)+1−αtϵt−1=αtαt−1xt−2+αt(1−αt−1)ϵt−2+1−αtϵt−1Notice a pattern emerging. The term multiplying xt−2 is the product of the square roots of the α values. The noise terms are also accumulating. We can leverage a property of Gaussian distributions: adding two independent Gaussian variables results in another Gaussian variable. Specifically, if Z1∼N(0,σ12I) and Z2∼N(0,σ22I) are independent, then Z1+Z2∼N(0,(σ12+σ22)I).
In our expansion, ϵt−1 and ϵt−2 are independent standard Gaussian noises (N(0,I)). The combined noise term αt(1−αt−1)ϵt−2+1−αtϵt−1 is also Gaussian. Its variance is (αt(1−αt−1))Var(ϵt−2)+(1−αt)Var(ϵt−1)=αt(1−αt−1)+(1−αt)=αt−αtαt−1+1−αt=1−αtαt−1. So, we can rewrite the combined noise term as 1−αtαt−1ϵˉt−2, where ϵˉt−2∼N(0,I).
This leads to:
xt=αtαt−1xt−2+1−αtαt−1ϵˉt−2If we continue this expansion all the way back to x0, we introduce the cumulative product notation αˉt=∏i=1tαi. The general form becomes:
xt=αˉtx0+1−αˉtϵwhereϵ∼N(0,I)This elegant equation is a cornerstone of diffusion models. It tells us that any noisy version xt can be obtained directly from the original data x0 by scaling x0 down by αˉt, generating a standard Gaussian noise vector ϵ, scaling it by 1−αˉt, and adding the two results.
Equivalently, we can say that the conditional distribution q(xt∣x0) is a Gaussian distribution:
q(xt∣x0)=N(xt;αˉtx0,(1−αˉt)I)The mean of this distribution is αˉtx0, and the variance is (1−αˉt)I.
Since αt=1−βt and βt is typically small and positive, αt is slightly less than 1. The cumulative product αˉt starts at αˉ0=1 (by convention) and decreases monotonically towards 0 as t increases towards the total number of steps T. Consequently, αˉt (the "signal rate") decreases from 1 to 0, while 1−αˉt (the "noise rate") increases from 0 to 1. This matches our intuition: as t increases, the contribution of the original data x0 diminishes, and the sample xt becomes increasingly dominated by noise, eventually approaching a standard Gaussian distribution N(0,I) when αˉT≈0.
This plot shows the typical evolution of the signal rate (αˉt) and noise rate (1−αˉt) over 1000 timesteps using a linear variance schedule from β1=10−4 to β1000=0.02. As t increases, the influence of the original data decreases while the influence of noise increases.
This closed-form expression xt=αˉtx0+1−αˉtϵ is crucial for training diffusion models efficiently. The goal of training is to learn a neural network ϵθ(xt,t) that can predict the noise ϵ that was added to get xt. To do this, we need training examples consisting of noisy data xt and the corresponding noise ϵ that was used to generate it.
Using the formula, we can create these training pairs quickly:
This ability to directly jump to any timestep t allows for parallelized and efficient generation of training batches, which is much faster than simulating the Markov chain step-by-step for each training sample. This sampling strategy forms the basis of the training loop we will discuss in Chapter 4.
© 2025 ApX Machine Learning