In the previous sections, we established the forward diffusion process as a step-by-step Markov chain where a small amount of Gaussian noise is added at each timestep t, governed by the variance schedule βt. The transition is defined as q(xt∣xt−1). While this defines the process, simulating it step-by-step to get a noisy sample xt from an initial data point x0 can be computationally expensive, especially for large T. Fortunately, there's a more direct way.
A significant property of this specific noising process is that we can derive a closed-form equation to sample xt directly from x0 for any timestep t, without needing to compute all the intermediate states x1,x2,...,xt−1. This is extremely useful, particularly during the training phase of the diffusion model.
Let's derive this relationship. Recall the single-step transition:
xt=1−βtxt−1+βtϵt−1whereϵt−1∼N(0,I)For convenience, let's define αt=1−βt. The equation becomes:
xt=αtxt−1+1−αtϵt−1Now, we can recursively expand this expression. Let's look at xt−1 in terms of xt−2:
xt−1=αt−1xt−2+1−αt−1ϵt−2Substituting this into the equation for xt:
xt=αt(αt−1xt−2+1−αt−1ϵt−2)+1−αtϵt−1=αtαt−1xt−2+αt(1−αt−1)ϵt−2+1−αtϵt−1Notice a pattern emerging. The term multiplying xt−2 is the product of the square roots of the α values. The noise terms are also accumulating. We can leverage a property of Gaussian distributions: adding two independent Gaussian variables results in another Gaussian variable. Specifically, if Z1∼N(0,σ12I) and Z2∼N(0,σ22I) are independent, then Z1+Z2∼N(0,(σ12+σ22)I).
In our expansion, ϵt−1 and ϵt−2 are independent standard Gaussian noises (N(0,I)). The combined noise term αt(1−αt−1)ϵt−2+1−αtϵt−1 is also Gaussian. Its variance is (αt(1−αt−1))Var(ϵt−2)+(1−αt)Var(ϵt−1)=αt(1−αt−1)+(1−αt)=αt−αtαt−1+1−αt=1−αtαt−1. So, we can rewrite the combined noise term as 1−αtαt−1ϵˉt−2, where ϵˉt−2∼N(0,I).
This leads to:
xt=αtαt−1xt−2+1−αtαt−1ϵˉt−2If we continue this expansion all the way back to x0, we introduce the cumulative product notation αˉt=∏i=1tαi. The general form becomes:
xt=αˉtx0+1−αˉtϵwhereϵ∼N(0,I)This foundational equation is of diffusion models. It tells us that any noisy version xt can be obtained directly from the original data x0 by scaling x0 down by αˉt, generating a standard Gaussian noise vector ϵ, scaling it by 1−αˉt, and adding the two results.
Equivalently, we can say that the conditional distribution q(xt∣x0) is a Gaussian distribution:
q(xt∣x0)=N(xt;αˉtx0,(1−αˉt)I)The mean of this distribution is αˉtx0, and the variance is (1−αˉt)I.
Since αt=1−βt and βt is typically small and positive, αt is slightly less than 1. The cumulative product αˉt starts at αˉ0=1 (by convention) and decreases monotonically towards 0 as t increases towards the total number of steps T. Consequently, αˉt (the "signal rate") decreases from 1 to 0, while 1−αˉt (the "noise rate") increases from 0 to 1. This matches our intuition: as t increases, the contribution of the original data x0 diminishes, and the sample xt becomes increasingly dominated by noise, eventually approaching a standard Gaussian distribution N(0,I) when αˉT≈0.
This plot shows the typical evolution of the signal rate (αˉt) and noise rate (1−αˉt) over 1000 timesteps using a linear variance schedule from β1=10−4 to β1000=0.02. As t increases, the influence of the original data decreases while the influence of noise increases.
This closed-form expression xt=αˉtx0+1−αˉtϵ is important for training diffusion models efficiently. The goal of training is to learn a neural network ϵθ(xt,t) that can predict the noise ϵ that was added to get xt. To do this, we need training examples consisting of noisy data xt and the corresponding noise ϵ that was used to generate it.
Using the formula, we can create these training pairs quickly:
This ability to directly jump to any timestep t allows for parallelized and efficient generation of training batches, which is much faster than simulating the Markov chain step-by-step for each training sample. This sampling strategy forms the basis of the training loop we will discuss in Chapter 4.
Was this section helpful?
© 2025 ApX Machine Learning