Long Short-Term Memory, Sepp Hochreiter and Jürgen Schmidhuber, 1997Neural Computation, Vol. 9 (MIT Press)DOI: 10.1162/neco.1997.9.8.1735 - Introduces the Long Short-Term Memory (LSTM) architecture, which addresses vanishing gradients in recurrent neural networks and is widely used in sequence modeling tasks like text classification.
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio, 2014Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)DOI: 10.48550/arXiv.1406.1078 - Presents the Gated Recurrent Unit (GRU), a simplified and computationally efficient variant of the LSTM, offering an alternative for sequence modeling.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Discusses recurrent neural networks, including LSTMs, GRUs, and bidirectional RNNs, providing important information for their application in sequence modeling and text classification.
Keras API: tf.keras.layers.LSTM, TensorFlow Authors, 2024 (TensorFlow) - Official documentation for the Keras LSTM layer, detailing parameters like return_sequences, mask_zero, and how it integrates with other layers for text classification architectures. This applies conceptually to GRU and Bidirectional wrappers.