Speech and Language Processing, Daniel Jurafsky and James H. Martin, 2025 (Pearson) - Comprehensive textbook offering a detailed introduction to speech recognition, covering acoustic modeling principles, feature extraction, and the overall ASR pipeline.
Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks, Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber, 2006Proceedings of the 23rd International Conference on Machine Learning (ICML) (Association for Computing Machinery)DOI: 10.1145/1143844.1143891 - This foundational paper introduces Connectionist Temporal Classification (CTC), a key algorithm for training recurrent neural networks to perform sequence-to-sequence tasks like acoustic modeling without requiring explicit pre-segmentation of the input.
Deep Neural Networks for Acoustic Modeling in Speech Recognition, Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, and Andrew Senior, 2012IEEE Signal Processing Magazine, Vol. 29 (IEEE)DOI: 10.1109/MSP.2012.2205597 - A seminal review and tutorial that established deep neural networks (DNNs) as the dominant approach for acoustic modeling in speech recognition, discussing their architecture and training.