Long Short-Term Memory, Sepp Hochreiter and Jürgen Schmidhuber, 1997Neural Computation, Vol. 9 (The MIT Press)DOI: 10.1162/neco.1997.9.8.1735 - The foundational paper introducing the LSTM architecture, detailing its design to overcome the limitations of traditional RNNs regarding long-term dependencies and gradient issues.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A widely-used textbook offering in-depth theoretical explanations of recurrent neural networks, including a dedicated section on LSTMs and their benefits for sequence modeling.