Speech and Language Processing, Daniel Jurafsky and James H. Martin, 2025 - A standard textbook covering traditional NLP techniques, the limitations of order-agnostic models, and an introduction to neural sequence models.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A comprehensive resource offering a mathematical foundation for deep learning, including detailed sections on recurrent neural networks and their variants.
Long Short-Term Memory, Sepp Hochreiter and Jürgen Schmidhuber, 1997Neural Computation, Vol. 9 (MIT Press)DOI: 10.1162/neco.1997.9.8.1735 - The original paper introducing Long Short-Term Memory (LSTM) networks, which address the vanishing gradient problem in recurrent neural networks.