Windowing Functions Explained

Dividing an audio signal into frames yields a series of short audio segments. This common processing step, however, introduces a challenge: each frame possesses an abrupt, sharp start and end. This represents an artificial discontinuity, which is absent in the original, continuous audio wave.

If we were to analyze the frequencies in these frames directly, these sharp edges would introduce a significant amount of high-frequency noise that wasn't in the original speech. This phenomenon is called spectral leakage, where the energy from a specific frequency "leaks" into other frequencies, distorting the true frequency content of the signal. To get an accurate representation, we must first smooth out these edges.

The Role of a Window Function

A window function is a mathematical function that we apply to each frame to solve the problem of spectral leakage. Its purpose is to reduce the amplitude of the signal at the beginning and end of the frame, tapering it smoothly towards zero. You can think of it as gently fading each frame in at the beginning and fading it out at the end.

By multiplying the frame's audio data with a window function, we minimize the sharp discontinuities at the boundaries. This results in a signal that is much better suited for frequency analysis, which is a critical next step in feature extraction.

The Hamming Window

While several types of window functions exist, such as Hann and Blackman, a very common and effective choice for speech recognition is the Hamming window. The Hamming window has a shape that is close to one in the middle and smoothly tapers toward small, non-zero values at the edges.

The process is straightforward: each sample point in the audio frame is multiplied by the corresponding sample point of the window function.

Let's visualize this process. First, imagine we have a single audio frame with sharp edges.

An audio frame sliced from a signal. Note the abrupt start and end values, which are not zero.

Next, we have the Hamming window, which has the same length as our frame.

A Hamming window. Its values are highest in the middle and taper toward the edges.

Finally, we perform an element-wise multiplication of the frame and the window. The resulting "windowed" frame now starts and ends near zero, creating a much smoother segment.

The audio frame after applying the Hamming window. The signal now tapers smoothly at both ends.

Reconnecting with Overlapping Frames

Windowing helps explain why we use overlapping frames, a topic from the previous section. Since a window function reduces the amplitude of the signal at the edges of each frame, we risk losing the information contained in those parts.

By overlapping the frames, we ensure that the information deemphasized at the end of one frame is captured with full emphasis in the middle of the next frame. This process guarantees that no part of the audio signal is ignored during our analysis.

The overlap between frames ensures that information reduced at the end of one windowed frame is properly analyzed in the subsequent frame.

With our audio now framed and windowed, the signal is properly prepared for the next and most important stage of preprocessing: extracting features that a machine learning model can use to distinguish between different sounds.

Was this section helpful?

References

Discrete-Time Signal Processing, Alan V. Oppenheim, Ronald W. Schafer, 2010 (Prentice Hall) - Offers a comprehensive treatment of digital signal processing fundamentals, including detailed explanations of window functions, their properties, and the phenomenon of spectral leakage.
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Daniel Jurafsky and James H. Martin, 2009 (Prentice Hall) - A standard textbook for speech recognition, providing context for audio signal preprocessing, including framing, windowing, and the use of the Hamming window to prepare signals for feature extraction.
Digital Signal Processing, Prof. Alan V. Oppenheim, 2011 MIT OpenCourseWare, Course RES.6-008 (Massachusetts Institute of Technology) - Contains lecture notes and problem sets covering discrete-time Fourier analysis, including the effects of finite-duration signals and the necessity of window functions to mitigate spectral leakage.