Bidirectional Recurrent Neural Networks, Mike Schuster and Kuldip K. Paliwal, 1997IEEE Transactions on Signal Processing, Vol. 45 (IEEE)DOI: 10.1109/78.627827 - Presents the concept of bidirectional recurrent neural networks, a technique explicitly used in the described acoustic model to leverage both past and future context in speech processing.
Long Short-Term Memory, Sepp Hochreiter and Jürgen Schmidhuber, 1997Neural Computation, Vol. 9 (MIT Press)DOI: 10.1162/neco.1997.9.8.1735 - The seminal work introducing Long Short-Term Memory (LSTM) networks, a recurrent architecture for handling long-range dependencies in sequential data like speech, extensively used in the model.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Provides comprehensive coverage of deep learning fundamentals, including detailed explanations of recurrent neural networks (RNNs), LSTMs, dense layers, and softmax, which form the building blocks of the presented acoustic model.