Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Provides a comprehensive introduction to deep learning, including detailed explanations of recurrent neural networks, LSTMs, and GRUs.
Efficient Estimation of Word Representations in Vector Space, Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean, 2013International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1301.3781 - Introduces the Word2Vec models (CBOW and Skip-gram) for learning word embeddings, which are fundamental for preparing text for sequence models.
Sequence to Sequence Learning with Neural Networks, Ilya Sutskever, Oriol Vinyals, Quoc V. Le, 2014Advances in Neural Information Processing Systems (NIPS), Vol. 27DOI: 10.48550/arXiv.1409.3215 - Presents the encoder-decoder architecture using LSTMs for mapping input sequences to output sequences of potentially different lengths, applicable to machine translation and other seq2seq tasks.
CS224n: Natural Language Processing with Deep Learning, Diyi Yang, Tatsunori Hashimoto, 2023 (Stanford University) - Provides lecture notes, assignments, and videos covering sequence models, word embeddings, and various NLP applications discussed in the section.