At the heart of generative modeling lies the objective of capturing the underlying structure and probability distribution of a given dataset. Imagine you have a collection of images, say, handwritten digits. A generative model aims to understand how these digits are formed, not just classify them. More formally, if we represent our data points (images, audio signals, text sequences) as x, the goal is to learn or approximate the true data distribution, often denoted as pdata(x). This function tells us the probability (or probability density for continuous data) of observing any particular data point x.
Why is learning pdata(x) useful?
However, pdata(x) is almost always unknown and incredibly complex, especially for high-dimensional data like natural images. A 256x256 pixel color image resides in a space with 256×256×3=196,608 dimensions. Directly modeling the probability distribution in such a high-dimensional space is computationally challenging and requires vast amounts of data.
Therefore, instead of finding pdata(x) exactly, we use a model distribution, pmodel(θ;x), which is defined by a set of learnable parameters θ. These parameters are typically the weights and biases of a deep neural network. The core task of training a generative model is to adjust θ such that pmodel(θ;x) becomes as close as possible to the true (but unknown) pdata(x).
Diagram illustrating the relationship between the true data distribution (pdata), observed data samples, the generative model's distribution (pmodel), its parameters (θ), and the generated samples. The training process aims to adjust θ so that pmodel closely approximates pdata.
How do we measure the "closeness" between pmodel and pdata and optimize θ? Different families of generative models employ different strategies:
Explicit Density Models: These models define an explicit mathematical formula for pmodel(θ;x) and often use Maximum Likelihood Estimation (MLE) for training. The goal is to find parameters θ that maximize the (log) probability of observing the training data: θ∗=argmaxθ∑i=1Nlogpmodel(θ;x(i)) where x(i) are the data points in the training set. While theoretically appealing, calculating or optimizing this likelihood can be intractable for many flexible models (like deep neural networks) due to complex dependencies or normalization constants. Techniques like Variational Autoencoders (VAEs), Flow-based Models, and Autoregressive Models fall under this umbrella, each using different methods to make the likelihood tractable or approximate it. Diffusion models, as we will see, also often connect to likelihood estimation, although their training objective might be formulated differently (e.g., score matching or denoising objectives).
Implicit Density Models: These models do not define an explicit pmodel(θ;x). Instead, they provide a mechanism to sample from the distribution they implicitly represent. Generative Adversarial Networks (GANs) are the prime example. A GAN's generator network G learns a transformation from a simple prior distribution pz(z) (e.g., Gaussian noise) to the complex data distribution. It learns to produce samples G(z) that are indistinguishable from real data x∼pdata(x), guided by the discriminator D. The min-max objective function you saw earlier drives this process, implicitly shaping the distribution of G(z) to match pdata(x) without ever needing to write down or compute the probability density of a generated sample.
Understanding this probabilistic foundation is essential. Whether explicitly maximizing likelihood or implicitly matching distributions through an adversarial game, the fundamental goal remains the same: to create a model capable of generating data that faithfully reflects the characteristics and variations present in the original dataset. As we progress, we will see how GANs and Diffusion Models leverage these probabilistic principles in distinct and powerful ways to achieve state-of-the-art results in synthetic data generation.
© 2025 ApX Machine Learning