Home
Blog
Courses
LLMs
EN
All Courses
Advanced Speech Recognition and Synthesis
Chapter 1: Foundations of Modern Speech Processing Pipelines
Advanced Audio Feature Extraction
Statistical Modeling Review for Speech
Deep Learning Architectures for Sequences
Components of ASR Systems
Components of TTS Systems
Evaluation Metrics Revisited
Chapter 2: Advanced Acoustic Modeling for ASR
Hybrid HMM-DNN Systems
Connectionist Temporal Classification (CTC)
Attention-Based Encoder-Decoder Models
RNN Transducer (RNN-T)
Transformer Architectures for ASR
Advanced Training Techniques
Decoding Algorithms Comparison
Hands-on Practical: Building an End-to-End ASR Model
Chapter 3: Language Modeling and Adaptation in ASR
Neural Language Models for ASR
Shallow Fusion and Deep Fusion
Contextual ASR
Speaker Adaptation Techniques
Environment and Channel Adaptation
Unsupervised and Semi-Supervised Learning for ASR
Multi-Lingual and Cross-Lingual ASR
Practice: Fine-tuning ASR with Adaptation Data
Chapter 4: Advanced Text-to-Speech Synthesis
Autoregressive Acoustic Models (Tacotron, Transformer TTS)
Non-Autoregressive Acoustic Models (FastSpeech, ParaNet)
Flow-Based Models for TTS
Generative Adversarial Networks (GANs) in TTS
Prosody Modeling and Control
Expressive Speech Synthesis
Voice Cloning and Conversion
Hands-on Practical: Training an Advanced TTS Model
Chapter 5: Neural Vocoders and Waveform Generation
Limitations of Traditional Vocoders
Autoregressive Waveform Models (WaveNet, WaveRNN)
Flow-Based Vocoders (WaveGlow, FloWaveNet)
GAN-Based Vocoders (MelGAN, HiFi-GAN)
Diffusion Models for Vocoding
Conditioning Neural Vocoders
Evaluation of Synthesized Audio Quality
Hands-on Practical: Using a Neural Vocoder
Chapter 6: Optimization, Deployment, and Toolkits
Model Quantization for Speech Models
Model Pruning and Sparsification
Knowledge Distillation for ASR/TTS
Optimized Inference Engines (ONNX Runtime, TensorRT)
Deployment Considerations for Streaming ASR
Deployment Considerations for Real-Time TTS
Overview of Speech Processing Toolkits (ESPnet, NeMo, Coqui)
Practice: Optimizing a Speech Model
Overview of Speech Processing Toolkits (ESPnet, NeMo, Coqui)
Was this section helpful?
Helpful
Report Issue
Mark as Complete
© 2025 ApX Machine Learning
Survey of Popular Speech Processing Toolkits