Masterclass
Transformers rely entirely on attention mechanisms, lacking the inherent sequence awareness of RNNs. Therefore, injecting positional information is essential. While the standard sinusoidal and learned absolute positional encodings introduced earlier provide a baseline, they possess certain limitations, particularly regarding generalization to longer sequences and explicitly capturing relative distances between tokens.
This chapter examines alternative approaches designed to encode positional information more effectively or efficiently. We will cover:
By the end of this chapter, you will understand the mechanics of these advanced positional encoding techniques and the contexts where they offer advantages over standard absolute encodings.
13.1 Limitations of Absolute Positional Encodings
13.2 Relative Positional Encoding Concepts
13.3 Implementation of Shaw et al.'s Relative Position
13.4 Transformer-XL Relative Positional Encoding
13.5 Rotary Position Embedding (RoPE)
© 2025 ApX Machine Learning