All Courses

Practice: Generating and Visualizing Positional Encodings

Having explored the theoretical underpinnings of sinusoidal positional encodings and why they are necessary to inform the Transformer about sequence order, let's put this into practice. Understanding how these encodings behave visually can provide valuable intuition. In this section, we will implement the positional encoding function and visualize the resulting vectors.

We will use Python with NumPy for numerical computation and Plotly for interactive visualizations, which are well-suited for web-based course materials.

Implementing the Positional Encoding Function

First, let's translate the mathematical formulas for sinusoidal positional encoding into code. Recall the formulas:

PE_{(pos, 2i)} = \sin(pos / 10000^{2i/d_{model}})

PE_{(pos, 2i+1)} = \cos(pos / 10000^{2i/d_{model}})

Where pos is the position in the sequence, $i$ is the index of the dimension within the embedding vector, and $d_{model}$ is the dimensionality of the embedding.

Here's a Python function using NumPy to generate these encodings:

import numpy as np

def get_positional_encoding(max_seq_len, d_model):
    """
    Generates sinusoidal positional encodings.

    Args:
        max_seq_len: Maximum sequence length.
        d_model: Dimensionality of the model embedding.

    Returns:
        A numpy array of shape (max_seq_len, d_model) containing
        the positional encodings.
    """
    if d_model % 2 != 0:
        raise ValueError("d_model must be an even number to accommodate sin/cos pairs.")

    # Initialize the positional encoding matrix
    pos_encoding = np.zeros((max_seq_len, d_model))

    # Create a column vector of positions [0, 1, ..., max_seq_len-1]
    position = np.arange(max_seq_len)[:, np.newaxis] # Shape: (max_seq_len, 1)

    # Calculate the division term: 1 / (10000^(2i / d_model))
    # Corresponds to i = 0, 1, ..., d_model/2 - 1
    div_term = np.exp(np.arange(0, d_model, 2) * -(np.log(10000.0) / d_model)) # Shape: (d_model/2,)

    # Apply sin to even indices (2i)
    pos_encoding[:, 0::2] = np.sin(position * div_term)

    # Apply cos to odd indices (2i + 1)
    pos_encoding[:, 1::2] = np.cos(position * div_term)

    return pos_encoding

# Example Usage:
max_len = 50  # Maximum sequence length
d_model = 128 # Embedding dimension (must be even)

positional_encodings = get_positional_encoding(max_len, d_model)

print(f"Shape of generated positional encodings: {positional_encodings.shape}")
# Output: Shape of generated positional encodings: (50, 128)

This function takes the maximum sequence length and the model's embedding dimension as input. It calculates the sine values for even indices and cosine values for odd indices based on the position and the div_term, which represents the frequency component. The result is a matrix where each row corresponds to a position in the sequence, and each column corresponds to a dimension in the positional encoding vector.

Visualizing Positional Encodings

Visualizing this matrix helps understand the structure of these encodings. A heatmap is an effective way to see how the encoding values change across positions and dimensions. We'll generate encodings for a sequence length of 50 and an embedding dimension of 128.

Heatmap visualizing sinusoidal positional encodings for a sequence of length 50 and embedding dimension 128. Each row represents a position, and each column represents a dimension index. Color intensity indicates the encoding value.

Analyzing the Visualization

From the heatmap, several properties discussed earlier become visually apparent:

Unique Encoding per Position: Each row (position) has a distinct pattern of colors, representing its unique encoding vector. This uniqueness is what allows the model to distinguish between different positions.
Varying Frequencies: Observe the wavelengths across the dimension axis (x-axis).
- The leftmost columns (low dimension indices, small $i$ ) exhibit high-frequency changes (rapid color shifts down the position axis). These dimensions encode fine-grained positional information.
- The rightmost columns (high dimension indices, large $i$ ) show low-frequency changes (slow color shifts). These dimensions encode coarser positional information over longer distances.
Smooth Transitions: The sinusoidal nature ensures smooth transitions between encodings of adjacent positions.
Bounded Values: All values are inherently within the range [-1, 1] due to the sine and cosine functions.

Let's further examine the uniqueness by plotting the encoding vectors for a few specific positions (e.g., position 0, 10, and 25) across all dimensions.

Line plots comparing the 128-dimensional positional encoding vectors for positions 0, 10, and 25. The distinct shape of each line highlights the unique encoding assigned to each sequence position.

These visualizations confirm that sinusoidal positional encodings provide a distinct signal for each position, varying smoothly across dimensions with different frequencies. This positional signal is then added to the input token embeddings, allowing the subsequent self-attention layers to consider the order of elements in the sequence.

In the next chapter, we will assemble these components, along with the multi-head attention mechanism, into the full Transformer encoder and decoder stacks.

Was this section helpful?