Converting a pre-recorded audio file into text is a fundamental task in speech recognition. This process, requiring a Python environment, forms the basis for many applications, such as transcribing interviews, generating subtitles for videos, or analyzing customer service calls. The SpeechRecognition library provides a convenient interface to several popular ASR services.
For this exercise, you will need an audio file. The WAV format is ideal because it is uncompressed and widely supported without needing extra software. If you don't have one, you can easily find or create a short WAV file of someone speaking a clear sentence, like "hello, this is a test". Save this file in the same directory where you will save your Python script, and name it test-audio.wav.
The process of transcribing an audio file with the SpeechRecognition library involves a few distinct steps. First, your script initializes a Recognizer object. This object then opens and reads the audio file, loading its data into a format the library can process. Finally, this audio data is sent to an external ASR engine, which returns the transcribed text.
A diagram of the steps involved in transcribing an audio file using a Python script and an ASR API.
Let's write the Python code to make this happen. Create a new file named transcribe_file.py and add the following code. We will walk through each part of the script.
import speech_recognition as sr
# 1. Initialize the recognizer
r = sr.Recognizer()
# 2. Define the audio file path
audio_file = "test-audio.wav"
# 3. Open the audio file and process it
with sr.AudioFile(audio_file) as source:
# Read the audio data from the file
audio_data = r.record(source)
# 4. Perform recognition
print("Transcribing audio...")
try:
# Use Google's free web speech API
text = r.recognize_google(audio_data)
print(f"Transcription: {text}")
except sr.UnknownValueError:
# API was unable to understand the audio
print("Google Speech Recognition could not understand the audio.")
except sr.RequestError as e:
# API was unreachable or unresponsive
print(f"Could not request results from Google Speech Recognition service; {e}")
Let's break down the script into its main components.
import speech_recognition as sr
r = sr.Recognizer()
First, we import the library, using sr as a standard alias to keep our code concise. Then, we create an instance of the Recognizer class. This r object is the central piece of our application. It is responsible for handling audio input and communicating with the ASR services.
audio_file = "test-audio.wav"
with sr.AudioFile(audio_file) as source:
# ... code to process the file goes here ...
Here, we specify the name of our audio file. We then use sr.AudioFile() within a with statement. This is an important practice because it automatically handles opening and closing the file, ensuring resources are managed correctly. The AudioFile object, which we call source, represents our opened audio file.
audio_data = r.record(source)
Inside the with block, we call r.record(source). This method takes the source object, reads the entire contents of the audio file, and stores it in an AudioData object. This audio_data variable now holds the audio in a format that the recognizer can work with.
try:
text = r.recognize_google(audio_data)
print(f"Transcription: {text}")
except sr.UnknownValueError:
print("Google Speech Recognition could not understand the audio.")
except sr.RequestError as e:
print(f"Could not request results from Google Speech Recognition service; {e}")
This is where the actual transcription happens. Because we are making a request over the network to an external service, things can go wrong. The audio might be noisy, the service might be down, or your internet connection could fail. Using a try...except block makes our script more resilient.
r.recognize_google(audio_data): This is the method that does the work. It sends the audio_data to Google's Web Speech API and waits for a response. If successful, it returns the transcribed text as a string.except sr.UnknownValueError: This error is raised when the speech recognizer cannot understand what was said. This could happen if the audio is just silence, contains too much background noise, or is in a language the API does not expect.except sr.RequestError: This error occurs if there's a problem with the network connection to the API, such as no internet access or an issue with the service itself.To run the script, save it and execute it from your terminal, ensuring your test-audio.wav file is in the same directory.
python transcribe_file.py
If everything works correctly, you should see an output similar to this:
Transcribing audio...
Transcription: hello this is a test
Congratulations. You have successfully written a program to convert spoken language from an audio file into text. This simple script is a powerful starting point. In the next section, we will adapt this code to handle live audio input directly from your microphone.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with