Long Short-Term Memory, Sepp Hochreiter, Jürgen Schmidhuber, 1997Neural Computation, Vol. 9 (MIT Press)DOI: 10.1162/neco.1997.9.8.1735 - The original paper introducing the Long Short-Term Memory (LSTM) architecture, detailing its design to overcome the vanishing gradient problem in recurrent neural networks.
Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - Chapter 10, 'Sequence Modeling: Recurrent and Recursive Networks,' offers a rigorous mathematical treatment of LSTMs, their history, and their role in deep learning.
Lecture Notes on Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM), Abigail See, Chris Potts, 2019 (Stanford University) - Lecture notes from a leading university course on natural language processing, providing a structured and in-depth explanation of RNNs, the vanishing gradient problem, and the mechanics of LSTM.