When we model sequences or time-dependent phenomena with Variational Autoencoders, such as with Recurrent VAEs (RVAEs) or VAEs incorporating attention mechanisms, we are implicitly or explicitly defining a system that evolves over time. This naturally brings us to the well-established domain of State-Space Models (SSMs), which have a long history in fields like control engineering, econometrics, and signal processing. Understanding the links between these two modeling paradigms can provide deeper insights into how sequential VAEs function and can guide the development of more principled architectures.
At their core, SSMs describe a system using a set of unobserved (latent) state variables zt that evolve over time, and a set of observed variables xt that depend on the current latent state. A typical discrete-time SSM is defined by two equations:
The functions f and g define the dynamics and observation process, respectively. In classical linear-Gaussian SSMs, such as those handled by the Kalman filter, f and g are linear functions, and wt and vt are assumed to be Gaussian noise. These models allow for exact inference of the latent states p(zt∣x1:T) (smoothing) or p(zt∣x1:t) (filtering) using efficient algorithms.
Sequential VAEs can be viewed as a powerful generalization of SSMs, particularly non-linear SSMs. Let's draw the parallels:
The generative process of a sequential VAE often follows this SSM-like structure:
The following diagram illustrates the structural similarities in their generative paths:
This diagram highlights the parallel components in the generative process of a State-Space Model and a sequential Variational Autoencoder. Both rely on a latent variable at time t−1 to inform the latent variable at time t, which then produces an observation.
While the structural analogy is strong, there are important differences, primarily stemming from the use of neural networks and variational inference in VAEs:
Non-Linearity and Expressiveness:
Inference:
Learning:
Several VAE architectures for sequential data explicitly or implicitly embody SSM principles:
Variational Recurrent Neural Network (VRNN): Integrates VAE principles within an RNN. At each time step, the RNN state ht influences the prior for zt. The latent zt and observation xt then update ht. This creates a dynamic system where latent variables guide sequence generation.
Deep Kalman Filters (DKFs) and Kalman VAEs (KVAEs): These models attempt to combine the structured probabilistic inference of Kalman filters with the expressive power of neural networks. For instance, a DKF might assume linear Gaussian transitions but use a neural network for the emission model:
p(zt∣zt−1)=N(zt∣Azt−1,Q) pθ(xt∣zt)=N(xt∣NNθ(zt),R)Inference in such models can still be challenging, and VAE-based approaches (like KVAEs) use variational methods to approximate the posterior over the structured latent states.
Deep Markov Models (DMMs): This term is often used for sequential VAEs where the latent states zt are assumed to follow a first-order Markov process, p(zt∣zt−1), typically parameterized by a neural network.
Recognizing sequential VAEs as sophisticated SSMs offers several benefits:
However, the high dimensionality and non-linear nature of latent spaces in VAEs mean that direct application of classical SSM analysis tools can be difficult. The interpretability of learned dynamics in complex VAEs remains an active area of research.
In summary, the relationship between VAEs for temporal data and State-Space Models is profound. VAEs extend the SSM framework by incorporating powerful non-linear function approximators (neural networks) and scalable inference techniques (amortized variational inference). This allows them to model significantly more complex sequential data than traditional SSMs, while the SSM perspective provides a valuable framework for understanding and designing these advanced generative models. As VAEs continue to evolve, this connection will likely inspire further innovations in modeling dynamic systems.
Was this section helpful?
© 2025 ApX Machine Learning