An interactive application can be created by using audio input and transcription. We will build a simple voice command tool that listens for a 'wake word' and then responds to specific instructions. This program combines audio input, transcription, and basic Python logic to create a functional program that responds to your voice.
Before writing any code, it is helpful to outline the program's flow. Our voice command tool will operate in a continuous loop, following these steps:
This cycle creates a simple but effective interaction model for a voice-controlled application.
The program's execution flow. It remains in a listening state until a wake word activates it, after which it processes a single command and returns to the initial state.
For this application, we will need a few libraries. You should have SpeechRecognition and PyAudio installed from the previous sections. We will also add pyttsx3, a text-to-speech library that allows our program to talk back to us.
If you have not installed pyttsx3 yet, you can do so using pip:
pip install pyttsx3
You may also need to install espeak or another TTS engine on Linux systems if it is not already present.
We will also use two of Python's built-in libraries: datetime to get the current time and webbrowser to open a web browser.
Let's assemble the code. We will create a single Python script. The structure will include an initialization phase for the recognizer and text-to-speech engine, followed by the main loop that drives the application.
First, we import the necessary libraries and set up our recognizer and text-to-speech engine. The engine allows the program to convert text strings into spoken audio.
import speech_recognition as sr
import pyttsx3
import datetime
import webbrowser
# Initialize the recognizer
r = sr.Recognizer()
# Initialize the text-to-speech engine
engine = pyttsx3.init()
# Define the wake word
WAKE_WORD = "hey computer"
The core of our program is a while loop that continuously listens for the wake word. Inside this loop, we use a try...except block to gracefully handle moments when the microphone does not pick up any clear speech.
When the wake word is detected in the transcribed text, we activate the command-listening phase.
def listen_for_audio():
"""Listens for audio and returns it as transcribed text."""
with sr.Microphone() as source:
print("Listening...")
# Adjust for ambient noise to improve accuracy
r.adjust_for_ambient_noise(source)
audio = r.listen(source)
try:
# Recognize speech using Google's engine
text = r.recognize_google(audio).lower()
print(f"You said: {text}")
return text
except sr.UnknownValueError:
# Speech was unintelligible
print("Sorry, I did not understand that.")
return ""
except sr.RequestError:
# API was unreachable or unresponsive
print("Sorry, my speech service is down.")
return ""
# Main loop
while True:
text = listen_for_audio()
if WAKE_WORD in text:
engine.say("Yes?")
engine.runAndWait()
# Now listen for the actual command
command_text = listen_for_audio()
# This is where we will process the command
# (to be implemented next)
In this structure, the program prints "Listening..." and waits. After you speak, it transcribes your speech. If the phrase "hey computer" is present, it responds with "Yes?" and then immediately calls listen_for_audio() again to capture the command that follows.
Now, let's add the logic to handle different commands. We will use a simple set of if/elif/else statements to check for keywords in the transcribed command.
Here is the complete script, including the command processing logic.
import speech_recognition as sr
import pyttsx3
import datetime
import webbrowser
# Initialize the recognizer
r = sr.Recognizer()
# Initialize the text-to-speech engine
engine = pyttsx3.init()
# Define the wake word
WAKE_WORD = "hey computer"
def speak(text):
"""Function to convert text to speech."""
print(f"Computer: {text}")
engine.say(text)
engine.runAndWait()
def listen_for_audio():
"""Listens for audio and returns it as transcribed text."""
with sr.Microphone() as source:
# We adjust for ambient noise once at the start of listening
r.adjust_for_ambient_noise(source, duration=0.5)
print("Listening for a command...")
audio = r.listen(source)
try:
# Recognize speech using Google's online service
text = r.recognize_google(audio).lower()
print(f"You said: {text}")
return text
except sr.UnknownValueError:
# This error means the library could not understand the audio
return ""
except sr.RequestError:
# This error means there was a problem with the API request
speak("Sorry, I'm having trouble connecting to the speech service.")
return ""
def process_command(command):
"""Processes the command and performs the corresponding action."""
if "what time is it" in command:
now = datetime.datetime.now().strftime("%I:%M %p")
speak(f"The current time is {now}.")
elif "open browser" in command:
speak("Opening your web browser.")
webbrowser.open("https://www.google.com")
elif "stop listening" in command or "goodbye" in command:
speak("Goodbye!")
return True # Signal to exit the loop
else:
speak("I'm not sure how to help with that.")
return False # Signal to continue listening
# --- Main Application Loop ---
speak("I am ready. Say the wake word to begin.")
while True:
print("\nListening for wake word...")
text_input = listen_for_audio()
if WAKE_WORD in text_input:
speak("Yes? How can I help?")
command = listen_for_audio()
if command:
should_exit = process_command(command)
if should_exit:
break
voice_assistant.py.python voice_assistant.pyThe program will print "I am ready" and start listening. Try saying, "Hey computer." It should respond, "Yes? How can I help?" Then, give it a command like, "What time is it?" or "Open browser." To stop the program, say, "Hey computer," wait for the response, and then say, "Stop listening."
This simple tool is a great starting point. Here are a few ideas for how you could expand upon it:
if "keyword" in command checks, you could use regular expressions or a more advanced intent-parsing library for more flexible command recognition.recognize_sphinx(), though it may require more configuration and be less accurate than online services.Was this section helpful?
© 2026 ApX Machine LearningEngineered with