The forward diffusion process is a Markov chain that gradually injects noise into data over discrete timesteps. A significant aspect of this process is controlling how much noise is added at each step. This isn't done with random amounts; instead, a carefully defined plan known as the noise schedule is followed.
The noise schedule is a sequence of variance values, denoted as . Each determines the variance of the Gaussian noise added when transitioning from state to . These values are hyperparameters, meaning they are chosen before training the model. They are not learned during the training process itself.
Think of the noise schedule as setting the "intensity" of the noising process at each step. The values of are typically chosen such that:
The original Denoising Diffusion Probabilistic Models (DDPM) paper proposed a linear schedule, where increases linearly from a small starting value (e.g., ) to a larger ending value (e.g., ) over steps (often ).
The formula for a linear schedule is:
While simple and effective, other schedules have been developed. A popular alternative is the cosine schedule, introduced in the "Improved Denoising Diffusion Probabilistic Models" paper. This schedule changes more slowly near the beginning and end of the process, potentially leading to better performance and preventing the signal from being destroyed too quickly early on.
The cosine schedule is defined using related quantities and . It sets based on a cosine function and then derives the corresponding :
Here, is a small offset (e.g., ) to prevent from being too small near . Once is calculated for all , the individual values can be recovered:
The choice of schedule impacts how quickly information from the original data is obscured. A schedule that adds noise too aggressively early on might make the reverse process harder to learn. Conversely, adding too little noise overall might not sufficiently transform the data into a simple prior distribution by step .
Let's visualize how these two common schedules compare for steps, with and for the linear schedule, and for the cosine schedule derived to approximately match the noise level at .
Variance () added at each timestep for linear and cosine schedules over steps. The cosine schedule adds noise more slowly initially and accelerates towards the end compared to the linear schedule.
Recall from the previous section that the forward process step is defined by the conditional probability . This transition is defined as adding Gaussian noise with a specific mean and variance:
Here, the noise schedule value directly sets the variance of the Gaussian noise added at timestep . The mean is scaled by to ensure the overall variance of the data doesn't explode. Since is small, is slightly less than 1, gradually shrinking the contribution of the previous state .
In summary, the Gaussian noise schedule is a sequence of pre-defined hyperparameters controlling the magnitude of noise added at each step of the forward diffusion process. Its design (e.g., linear, cosine) and the range of values are important choices that influence the trajectory from data to noise and affect the subsequent learning of the reverse process. We will see later how these values, and related quantities derived from them, appear in both the training objective and the sampling process.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with