While standard noise schedules like linear and cosine provide a good starting point and are widely used, they might not be optimal for all datasets or generative tasks. As we discussed the limitations of these fixed schedules, the natural next step is to explore how we can design custom noise schedules tailored to specific requirements. The goal is to control the rate at which noise is added during the forward process, thereby influencing the reverse denoising process learned by the model.
Recall that the forward process variance schedule, typically denoted by βt for timesteps t=1,...,T, dictates the entire diffusion process. From βt, we derive αt=1−βt and the cumulative product αˉt=∏i=1tαi. The choice of βt values directly impacts how quickly information from the initial data x0 is obscured. A schedule that adds noise too quickly might destroy fine details early on, making it harder for the model to recover them. Conversely, a schedule that adds noise too slowly might require a very large number of timesteps T or lead to inefficient learning in later stages where the signal is already weak.
Designing a custom schedule often involves defining a function or a sequence for βt that deviates from the standard linear or cosine forms.
Why move beyond established schedules?
Instead of relying on predefined formulas like linear or cosine, we can define βt using other functional forms or rules:
Polynomial Schedules: Generalize the linear schedule βt∝t or the cosine schedule. We can define βt using higher-order polynomials, for instance:
βt=c1(Tt)p+c0where p is the polynomial degree, and c1,c0 are constants chosen to ensure βt stays within a desired range (e.g., 10−4 to 0.02) and maintains monotonicity. Varying p allows for different curvatures in the noise addition rate.
Piecewise Schedules: Define different functions for βt over different intervals of t. For example, a schedule could be linear for the first T/2 steps and then transition to a cosine-like decay for the remaining steps. This allows fine-grained control over noise addition at different phases of the diffusion process.
Signal-to-Noise Ratio (SNR) Based Design: A more principled approach involves designing the schedule based on the desired Signal-to-Noise Ratio (SNR) at each timestep t. The SNR is defined as:
SNR(t)=E[(1−αˉtϵ)2]E[(αˉtx0)2]=(1−αˉt)E[ϵ2]αˉtE[x02]Assuming E[x02]≈1 and E[ϵ2]=1 after normalization, SNR(t)≈1−αˉtαˉt. We can work backward: define a target SNR(t) function (e.g., exponentially decaying) and derive the required αˉt, and subsequently βt. This connects the schedule design directly to the information content remaining at each step. For example, ensuring the SNR drops smoothly might lead to more stable training.
Logarithmic Schedules: Schedules can also be defined in log-space, often focusing on the log-SNR. This can provide better control over the dynamics, especially when dealing with very small or very large αˉt values.
Let's compare the cumulative noise level, represented by 1−αˉt (which indicates how much noise dominates the signal), for different schedule types over T=1000 timesteps. A higher value means more noise.
A comparison showing how quickly noise accumulates under linear, cosine, and a hypothetical custom (quadratic-like) schedule. The custom schedule adds noise slowly initially, then accelerates, compared to the cosine schedule which adds noise faster early on.
When implementing a custom schedule, you typically need to pre-compute the βt, αt, and αˉt values for t=1,...,T. These are then stored and used during both training (for sampling xt given x0) and inference (for the denoising steps).
Key steps involve:
Evaluating a custom noise schedule requires empirical testing. Train your diffusion model using the new schedule and compare:
Designing custom schedules is often an iterative process of proposing a schedule, training, evaluating, and refining. There's a trade-off between the complexity of the schedule design and the potential gains. While standard schedules work well generally, a carefully designed custom schedule can provide significant advantages for specific applications or push the boundaries of sample quality. This understanding paves the way for exploring schedules that are not just designed, but learned, which we will cover in the next section.
© 2025 ApX Machine Learning