Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Provides comprehensive coverage of recurrent neural networks, their computational graphs, and the backpropagation through time algorithm, detailing the inherent sequential dependencies that limit parallelization.
Long Short-Term Memory, Sepp Hochreiter and Jürgen Schmidhuber, 1997Neural Computation, Vol. 9 (The MIT Press)DOI: 10.1162/neco.1997.9.8.1735 - Presents the foundational architecture of Long Short-Term Memory (LSTM) networks, illustrating the design of a widely used recurrent unit that exhibits the discussed sequential dependencies.
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio, 2014Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Association for Computational Linguistics)DOI: 10.3115/v1/D14-1179 - Introduces the Gated Recurrent Unit (GRU), another significant recurrent model, further demonstrating architectures constrained by sequential computation.