Long Short-Term Memory, Sepp Hochreiter, Jürgen Schmidhuber, 1997Neural Computation, Vol. 9 (The MIT Press)DOI: 10.1162/neco.1997.9.8.1735 - Introduces the original LSTM architecture, outlining its gates and the sequential computation flow.
Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - Offers a comprehensive explanation of recurrent neural networks, covering their sequential characteristics and computational graphs. See Chapter 10: Sequence Modeling: Recurrent and Recursive Networks.
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30 (NIPS 2017)DOI: 10.48550/arXiv.1706.03762 - Presents the Transformer architecture, directly addressing the sequential computation limitations of recurrent networks as a driving factor.