After converting raw audio into a sequence of feature vectors, the next step is to map those features to the basic sounds of a language. This is the primary function of the acoustic model. It addresses the question: given a small segment of audio, what is the probability that a specific phoneme, like /k/, /æ/, or /t/, was spoken?
The acoustic model provides the statistical relationship between the audio signal and its corresponding phonetic units. It computes the likelihood P(\text{audio_features} | \text{phoneme}), a probability that serves as a key input for the final transcription process.
In this chapter, you will cover:
By the end, you will have a clear picture of how this component connects processed sound to the building blocks of speech.
3.1 What is an Acoustic Model?
3.2 Mapping Sounds to Phonemes
3.3 Early Approaches: Gaussian Mixture Models (GMMs)
3.4 Hidden Markov Models (HMMs) for Sequential Data
3.5 Combining GMMs and HMMs
3.6 Introduction to Neural Network-based Acoustic Models
3.7 The Role of an Acoustic Model in an ASR System
© 2026 ApX Machine LearningEngineered with