WaveNet: A Generative Model for Raw Audio, Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, Koray Kavukcuoglu, 2016arXiv preprint arXiv:1609.03499DOI: 10.48550/arXiv.1609.03499 - The foundational paper introducing WaveNet's architecture, including causal and dilated convolutions, for high-fidelity raw audio generation.
Efficient Neural Audio Synthesis, Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Noury, Norman Casagrande, Edward Lockhart, Florian Stimberg, Aaron van den Oord, Sander Dieleman, Koray Kavukcuoglu, 2018International Conference on Machine Learning (ICML)DOI: 10.48550/arXiv.1802.08435 - The original paper introducing WaveRNN, an efficient autoregressive model for neural audio synthesis, emphasizing speed optimizations.
A Survey of Text-to-Speech Synthesis Based on Deep Neural Networks, Heiga Zen, Andrew Senior, Mike Schuster, 2019ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE)DOI: 10.1109/ICASSP.2019.8682122 - A comprehensive survey on deep neural network-based text-to-speech synthesis, providing context and discussions on neural vocoders like WaveNet.