Practice: Build a Simple Voice Command Tool

An interactive application can be created by using audio input and transcription. We will build a simple voice command tool that listens for a 'wake word' and then responds to specific instructions. This program combines audio input, transcription, and basic Python logic to create a functional program that responds to your voice.

The Application's Logic

Before writing any code, it is helpful to outline the program's flow. Our voice command tool will operate in a continuous loop, following these steps:

Listen for a Wake Word: The program will passively listen for a specific phrase, like "Hey computer".
Activate and Acknowledge: Once the wake word is detected, the program will signal that it is ready for a command, perhaps by saying "Yes?".
Listen for a Command: The program will then listen for the user's next instruction, such as "What time is it?".
Execute the Command: It will process the transcribed text to identify the command and perform the corresponding action.
Return to Listening: After executing the command, the program will go back to step 1, waiting for the wake word again.

This cycle creates a simple but effective interaction model for a voice-controlled application.

The program's execution flow. It remains in a listening state until a wake word activates it, after which it processes a single command and returns to the initial state.

Setting Up the Tools

For this application, we will need a few libraries. You should have SpeechRecognition and PyAudio installed from the previous sections. We will also add pyttsx3, a text-to-speech library that allows our program to talk back to us.

If you have not installed pyttsx3 yet, you can do so using pip:

pip install pyttsx3

You may also need to install espeak or another TTS engine on Linux systems if it is not already present.

We will also use two of Python's built-in libraries: datetime to get the current time and webbrowser to open a web browser.

Building the Tool, Step by Step

Let's assemble the code. We will create a single Python script. The structure will include an initialization phase for the recognizer and text-to-speech engine, followed by the main loop that drives the application.

1. Initialization

First, we import the necessary libraries and set up our recognizer and text-to-speech engine. The engine allows the program to convert text strings into spoken audio.

import speech_recognition as sr
import pyttsx3
import datetime
import webbrowser

# Initialize the recognizer
r = sr.Recognizer()

# Initialize the text-to-speech engine
engine = pyttsx3.init()

# Define the wake word
WAKE_WORD = "hey computer"

2. The Main Application Loop

The core of our program is a while loop that continuously listens for the wake word. Inside this loop, we use a try...except block to gracefully handle moments when the microphone does not pick up any clear speech.

When the wake word is detected in the transcribed text, we activate the command-listening phase.

def listen_for_audio():
    """Listens for audio and returns it as transcribed text."""
    with sr.Microphone() as source:
        print("Listening...")
        # Adjust for ambient noise to improve accuracy
        r.adjust_for_ambient_noise(source)
        audio = r.listen(source)

    try:
        # Recognize speech using Google's engine
        text = r.recognize_google(audio).lower()
        print(f"You said: {text}")
        return text
    except sr.UnknownValueError:
        # Speech was unintelligible
        print("Sorry, I did not understand that.")
        return ""
    except sr.RequestError:
        # API was unreachable or unresponsive
        print("Sorry, my speech service is down.")
        return ""

# Main loop
while True:
    text = listen_for_audio()

    if WAKE_WORD in text:
        engine.say("Yes?")
        engine.runAndWait()

        # Now listen for the actual command
        command_text = listen_for_audio()

        # This is where we will process the command
        # (to be implemented next)

In this structure, the program prints "Listening..." and waits. After you speak, it transcribes your speech. If the phrase "hey computer" is present, it responds with "Yes?" and then immediately calls listen_for_audio() again to capture the command that follows.

3. Processing Commands

Now, let's add the logic to handle different commands. We will use a simple set of if/elif/else statements to check for keywords in the transcribed command.

Here is the complete script, including the command processing logic.

import speech_recognition as sr
import pyttsx3
import datetime
import webbrowser

# Initialize the recognizer
r = sr.Recognizer()

# Initialize the text-to-speech engine
engine = pyttsx3.init()

# Define the wake word
WAKE_WORD = "hey computer"

def speak(text):
    """Function to convert text to speech."""
    print(f"Computer: {text}")
    engine.say(text)
    engine.runAndWait()

def listen_for_audio():
    """Listens for audio and returns it as transcribed text."""
    with sr.Microphone() as source:
        # We adjust for ambient noise once at the start of listening
        r.adjust_for_ambient_noise(source, duration=0.5)
        print("Listening for a command...")
        audio = r.listen(source)

    try:
        # Recognize speech using Google's online service
        text = r.recognize_google(audio).lower()
        print(f"You said: {text}")
        return text
    except sr.UnknownValueError:
        # This error means the library could not understand the audio
        return ""
    except sr.RequestError:
        # This error means there was a problem with the API request
        speak("Sorry, I'm having trouble connecting to the speech service.")
        return ""

def process_command(command):
    """Processes the command and performs the corresponding action."""
    if "what time is it" in command:
        now = datetime.datetime.now().strftime("%I:%M %p")
        speak(f"The current time is {now}.")

    elif "open browser" in command:
        speak("Opening your web browser.")
        webbrowser.open("https://www.google.com")

    elif "stop listening" in command or "goodbye" in command:
        speak("Goodbye!")
        return True # Signal to exit the loop

    else:
        speak("I'm not sure how to help with that.")

    return False # Signal to continue listening

# --- Main Application Loop ---
speak("I am ready. Say the wake word to begin.")

while True:
    print("\nListening for wake word...")
    text_input = listen_for_audio()

    if WAKE_WORD in text_input:
        speak("Yes? How can I help?")
        command = listen_for_audio()

        if command:
            should_exit = process_command(command)
            if should_exit:
                break

Running Your Voice Command Tool

Save the code above into a file named voice_assistant.py.
Open your terminal or command prompt.
Navigate to the directory where you saved the file.
Run the script with the command: python voice_assistant.py

The program will print "I am ready" and start listening. Try saying, "Hey computer." It should respond, "Yes? How can I help?" Then, give it a command like, "What time is it?" or "Open browser." To stop the program, say, "Hey computer," wait for the response, and then say, "Stop listening."

Suggestions for Improvement

This simple tool is a great starting point. Here are a few ideas for how you could expand upon it:

Add More Commands: Integrate with other libraries or APIs to add more functionality. You could have it fetch the weather, read headlines from a news site, or control smart home devices.
Improve Command Parsing: Instead of simple if "keyword" in command checks, you could use regular expressions or a more advanced intent-parsing library for more flexible command recognition.
Offline Recognition: For improved privacy and performance without an internet connection, you could experiment with an offline recognizer like recognize_sphinx(), though it may require more configuration and be less accurate than online services.
Customizable Wake Word: Allow the user to set their own wake word instead of having it hardcoded in the script.

Was this section helpful?

References

SpeechRecognition documentation, Kory Northrup, 2024 - Details usage of the Python SpeechRecognition library, covering microphone input, transcription methods, and error handling.
pyttsx3 documentation, Natesh M Bhat, 2023 - Official guide for the pyttsx3 Python library, offering methods for text-to-speech conversion and engine configuration.
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Daniel Jurafsky and James H. Martin, 2025 (Stanford University) - Chapters 7-9 provide coverage of Automatic Speech Recognition (ASR) and Chapter 24 for Text-to-Speech (TTS) principles, essential for understanding how these systems operate.
Designing Voice User Interfaces: Principles of Conversational Experiences, Cathy Pearl, 2016 (O'Reilly Media) - Presents principles for designing effective voice-controlled applications, including wake word activation and user interaction models.