Adversarial Audio Synthesis, Chris Donahue, Julian McAuley, Miller Puckette, 2019International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1802.04208 - Introduces WaveGAN, a seminal model for direct raw audio waveform synthesis using 1D convolutional GANs, addressing challenges with phase shuffle.
Improved Training of Wasserstein GANs, Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron C. Courville, 2017Advances in Neural Information Processing Systems (NeurIPS), Vol. 30DOI: 10.48550/arXiv.1704.00028 - Presents the Wasserstein GAN with Gradient Penalty, a robust and widely adopted technique for stabilizing GAN training, especially relevant for models like WaveGAN.
MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis, Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, Lucas Gestin, Wei Zhen Teoh, Jose Sotelo, Alexandre de Brebisson, Yoshua Bengio, Aaron Courville, 2019Advances in Neural Information Processing Systems (NeurIPS), Vol. 32DOI: 10.48550/arXiv.1910.06711 - Introduces a fast and high-fidelity GAN-based vocoder for converting mel-spectrograms into raw audio, crucial for the quality of spectrogram-based audio synthesis methods.