This chapter shifts focus to the generation of speech, detailing the methods used to build modern Text-to-Speech (TTS) systems. The aim is to progress from basic synthesis concepts to techniques capable of producing high-fidelity, natural-sounding, and controllable artificial voices.
You will examine the architecture and training processes for several categories of state-of-the-art acoustic models:
Beyond the core model architectures, we will cover methods for:
The chapter includes a hands-on practical section focused on training an advanced TTS model using a contemporary toolkit.
4.1 Autoregressive Acoustic Models (Tacotron, Transformer TTS)
4.2 Non-Autoregressive Acoustic Models (FastSpeech, ParaNet)
4.3 Flow-Based Models for TTS
4.4 Generative Adversarial Networks (GANs) in TTS
4.5 Prosody Modeling and Control
4.6 Expressive Speech Synthesis
4.7 Voice Cloning and Conversion
4.8 Hands-on Practical: Training an Advanced TTS Model
© 2025 ApX Machine Learning