While Generative Adversarial Networks are frequently associated with creating realistic images, their application extends to generating various other forms of data. This chapter shifts focus from standard image synthesis to examine how GANs can be adapted for different modalities.
You will learn about the specific difficulties encountered when applying GANs to non-image data. We will discuss methods for handling discrete sequences, such as text, including approaches using reinforcement learning signals and continuous approximations like the Gumbel-Softmax trick. We will also cover techniques for generating audio waveforms and spectrograms, coherent video sequences, 3D data representations like point clouds, and structured data such as graphs. The chapter examines the architectural modifications and training strategies needed to successfully apply adversarial training principles to these diverse data types.
6.1 Challenges with Discrete Data: Text Generation
6.2 Reinforcement Learning Approaches (SeqGAN, RankGAN)
6.3 Continuous Approximations (Gumbel-Softmax)
6.4 Audio Synthesis with GANs (WaveGAN, SpecGAN)
6.5 Video Generation and Prediction
6.6 3D Data Generation (Point Clouds, Meshes)
6.7 Graph Generation with GANs
© 2025 ApX Machine Learning