APX AI
Online
I can see the page you're looking at. Ask me anything!
Masterclass
To understand the innovations of the Transformer architecture, it's helpful to first examine the sequence processing models that came before it. This chapter provides a concise review of Recurrent Neural Networks (RNNs) and their more sophisticated variants, Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRUs).
We will look at:
This review establishes the context for why attention mechanisms and the Transformer architecture represented a significant shift in modeling sequential data, which we will cover in the subsequent chapter.
3.1 Fundamentals of Recurrent Neural Networks (RNNs)
3.2 Limitations of Simple RNNs
3.3 Long Short-Term Memory (LSTM) Networks
3.4 Gated Recurrent Units (GRUs)
3.5 Sequence-to-Sequence Models with RNNs