Chapter 5: Language Modeling and Decoding

In the preceding chapters, we constructed acoustic models that map audio features to sequences of character probabilities. While these models are effective at identifying phonetic content, their output can be acoustically plausible yet linguistically incorrect. For instance, a model might transcribe "recognize speech" as the phonetically similar "wreck a nice beach".

This is where a language model (LM) is introduced. An LM scores a sequence of words based on its grammatical structure and the likelihood of its occurrence, helping the system distinguish between sensible and nonsensical transcriptions. The process of using an LM to guide the selection of the final text from the acoustic model's predictions is called decoding. A decoder's objective is to find the word sequence $W$ that maximizes a combined score, which is often a weighted sum of the acoustic and language model probabilities:

\text{score}(W) = \log P_{\text{Acoustic}}(X|W) + \alpha \log P_{\text{Language Model}}(W)

Here, $P_{\text{Acoustic}}(X|W)$ is the probability assigned by the acoustic model to the audio features $X$ given the word sequence $W$ . The term $P_{\text{Language Model}}(W)$ is the probability of the word sequence itself, and $\alpha$ is a weight that balances the influence of the two models.

This chapter covers the theory and practice of integrating language models into an ASR system. You will learn to:

Explain the function of a language model in an ASR pipeline.
Build a statistical n-gram language model from a text corpus using the KenLM toolkit.
Compare simple greedy search decoding with the more effective beam search algorithm.
Implement a beam search decoder that incorporates scores from an external language model to improve transcription accuracy.

Sections

5.1 The Function of Language Models in ASR
5.2 N-gram Language Models
5.3 Building an N-gram Model with KenLM
5.4 Decoding Graphs for Model Integration
5.5 Decoding Algorithms: Greedy Search vs Beam Search
5.6 Implementing Beam Search with a Language Model
5.7 Hands-on Practical: Integrating a Language Model into a CTC Decoder