An acoustic model can determine the probability of phonemes given a piece of audio, but this alone is insufficient for accurate transcription. For example, the phrases "recognize speech" and "wreck a nice beach" can sound very similar. An acoustic model might assign high probabilities to both interpretations. To resolve this ambiguity, the system needs to understand which sequence of words is more likely in the given language.
This chapter introduces the language model, the component responsible for adding linguistic context to the recognition process. By assigning a probability to a sequence of words, the language model helps the ASR system choose the most plausible transcription from a set of acoustically similar candidates.
You will learn about the following topics:
4.1 What is a Language Model?
4.2 The Problem of Ambiguity in Speech
4.3 N-gram Language Models: Bigrams and Trigrams
4.4 Calculating Probabilities of Word Sequences
4.5 The Concept of Perplexity
4.6 How Language Models Improve Accuracy
4.7 Introduction to Neural Network Language Models
© 2026 ApX Machine LearningEngineered with