When you call a speech recognition service, you are sending audio data over a network and waiting for a response. This process is not instantaneous and can fail for reasons outside your program's control. The service might be temporarily down, your internet connection could drop, or the audio you sent might not contain any recognizable speech. A well-built application must anticipate these issues and respond gracefully instead of crashing.
This section covers how to handle the different responses from an ASR service, including both successful transcriptions and common errors. We will use Python's try...except blocks to build more resilient speech recognition code.
When a recognition attempt succeeds, most libraries and APIs return more than just the transcribed text. They often provide a structured response, typically in a dictionary or object format, containing additional metadata. While the exact structure varies between services (like Google Web Speech API vs. Wit.ai), they often include similar information.
For example, a service might return multiple possible transcriptions, each with a confidence score.
# A potential response object from an ASR service
{
"transcriptions": [
{
"transcript": "what time is it",
"confidence": 0.94
},
{
"transcript": "what time was it",
"confidence": 0.05
}
],
"is_final": True,
"language_code": "en-US"
}
In this response, the most likely transcription is "what time is it" with a 94% confidence score. Accessing this extra information can be useful for more advanced applications. For instance, if the top confidence score is very low, you might ask the user to repeat themselves. For now, our primary goal is to reliably get the main transcript.
In programming, we can't only plan for the "happy path" where everything works perfectly. We must also handle the inevitable errors. With the popular SpeechRecognition library in Python, API calls will raise exceptions when things go wrong. The two most common exceptions you will encounter are UnknownValueError and RequestError.
Sometimes, the audio is successfully sent to the ASR service, but the service cannot find any recognizable speech in it. This can happen if the microphone only picked up background noise, if the speaker mumbled, or if the audio was silent.
In this case, the SpeechRecognition library raises an UnknownValueError. Your program should catch this exception and inform the user that the audio could not be understood.
Another common issue occurs when your program cannot communicate with the ASR service at all. This could be due to several reasons:
For these situations, the library raises a RequestError. Catching this allows you to provide a helpful message, like "Could not connect to the speech recognition service," which is much better than letting the program terminate with a cryptic network error.
To handle these potential failures, you should wrap your API calls in a try...except block. This tells Python: "Try to run this code, but if a specific error occurs, don't crash. Instead, run this other block of code."
Let's look at a simple transcription script from the previous section and make it stronger.
Here is the original, "happy path" code:
# WARNING: This code does not handle errors!
import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
# This line will crash if speech is not understood or the API is down
text = r.recognize_google(audio)
print("You said: " + text)
If you run this code and just stay silent, the program will crash with an UnknownValueError.
Now, let's add error handling:
# This code includes error handling
import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
# Use a try...except block to handle potential errors
try:
# Attempt to recognize the speech
text = r.recognize_google(audio)
print("You said: " + text)
except sr.UnknownValueError:
# This block runs if the speech was unintelligible
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
# This block runs if there was a problem with the service
print(f"Could not request results from Google Speech Recognition service; {e}")
This updated version is much more user-friendly. It provides clear feedback for each of the common failure modes instead of crashing. The logic of this flow can be visualized as a decision path.
This diagram illustrates the control flow for a speech recognition attempt. The program tries to perform the recognition and follows a different path depending on whether it succeeds or encounters a specific error.
As you move on to build the voice command tool in the next section, keep these guidelines in mind:
try block.speech_recognition.UnknownValueError and speech_recognition.RequestError separately to provide distinct, informative feedback to the user.By handling both success and failure, you ensure your application is reliable and provides a good user experience, which is a significant step in moving from simple scripts to functional applications.
Was this section helpful?
© 2026 ApX Machine LearningAI Ethics & Transparency•