Applied Speech Recognition

This course provides a comprehensive guide to building and implementing automatic speech recognition (ASR) systems. It covers the complete workflow, starting from audio signal processing and feature extraction to constructing and training modern deep learning models. Participants will work with contemporary tools and architectures, such as LSTMs, Transformers, and Connectionist Temporal Classification (CTC) loss, to build functional speech-to-text pipelines. The material is designed for engineers and developers with a foundation in machine learning who want to develop practical skills in the speech technology domain.

Prerequisites Python & ML foundation

Level:

Professional

Audio Preprocessing
Preprocess and prepare audio data for ASR models.
Feature Extraction
Implement feature extraction techniques like MFCCs and Log-Mel Spectrograms.
Acoustic Modeling
Build and train acoustic models using RNNs, LSTMs, and Transformers.
Language Modeling
Integrate language models into the decoding process for improved accuracy.
System Evaluation
Evaluate and benchmark ASR system performance using standard metrics like WER.
Deployment
Construct a functional speech-to-text application pipeline.