Distilling the Knowledge in a Neural Network, Geoffrey Hinton, Oriol Vinyals, Jeff Dean, 2015NIPS 2014 Deep Learning WorkshopDOI: 10.48550/arXiv.1503.02531 - The seminal paper introducing the knowledge distillation technique, detailing the use of soft targets and temperature scaling for transferring knowledge from a large teacher model to a smaller student model.
Knowledge Distillation: A Survey, Jianping Gou, Baosheng Yu, Stephen J. Maybank, Dacheng Tao, 2021International Journal of Computer Vision, Vol. 129 (Springer Science and Business Media LLC)DOI: 10.1007/s11263-021-01453-z - A comprehensive survey covering various knowledge distillation methods, theories, and applications across different domains, including speech.
FastSpeech: Fast, Robust and Controllable Text to Speech, Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu, 2019Advances in Neural Information Processing Systems (NeurIPS), Vol. 32 (Neural Information Processing Systems)DOI: 10.48550/arXiv.1905.09292 - Presents a non-autoregressive TTS model trained with knowledge distillation from an autoregressive teacher, achieving significant speedup while maintaining quality.