Diffusion models operate by progressively adding noise to data and then learning to reverse this process. While models like DDPM are often presented using discrete time steps, a more general and powerful mathematical framework arises when we consider the continuous-time limit of this noising process. This leads us to the language of Stochastic Differential Equations (SDEs).
Understanding the SDE formulation provides deeper insights into why diffusion models work and unifies various discrete-time diffusion approaches under a single mathematical structure. It also opens doors to more flexible noise scheduling and sampling techniques.
Recall the discrete forward process in DDPM, where noise is added incrementally at each step . If we consider infinitesimally small time steps, this sequence of transformations converges to a continuous stochastic process. An SDE describes the evolution of a variable over continuous time, incorporating both deterministic change (drift) and random fluctuations (diffusion).
A general Itô SDE takes the form:
Here:
In the context of diffusion models, the forward process transforms complex data into a simple noise distribution (typically Gaussian) as time progresses from to . This "information destruction" process can be modeled by a specific SDE. A common choice, corresponding to the Variance Preserving (VP) SDE often linked to DDPM, is:
Here, is a positive, time-dependent function often called the noise schedule.
As increases from to , the influence of the initial data diminishes, and approaches a standard Gaussian distribution, irrespective of .
The generative power of diffusion models comes from reversing this process. We start with a sample from the simple noise distribution and evolve it backward in time from to to generate a data sample . A remarkable result from stochastic calculus (Anderson, 1982) states that the reverse trajectory of a diffusion process defined by a forward SDE also follows an SDE, provided we know the score function of the marginal distributions .
The reverse SDE corresponding to the forward process above is given by:
Here:
This reverse SDE tells us how to infinitesimally adjust the current state to make it slightly more likely under the data distribution . The drift term now includes the score function, effectively guiding the process away from noise and towards plausible data structures.
The central challenge in using the reverse SDE for generation is that we don't know the true score function for the intermediate distributions . This is where neural networks come in. We train a time-dependent neural network, often denoted as , to approximate the true score function:
This network is typically trained using objectives derived from score matching or objectives equivalent to those used in DDPMs (like the objective mentioned in the chapter introduction, which implicitly learns the score). Once trained, can be plugged into the reverse SDE:
Simulating this SDE backward in time, starting from , allows us to generate new data samples .
Diagram illustrating the forward SDE destroying data structure over time and the learned reverse SDE reconstructing data from noise by following the estimated score function.
Viewing diffusion models through the lens of SDEs offers several advantages:
This continuous-time perspective sets the stage for understanding score-based generative modeling and advanced techniques like DDIM, which uses properties of the underlying SDEs for efficient sampling. We will build on these foundations as we examine the implementation details and improvements of diffusion models in the subsequent sections.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with