Distilling the knowledge in a neural network, Geoffrey Hinton, Oriol Vinyals, Jeff Dean, 2015arXiv preprint arXiv:1503.02531DOI: 10.48550/arXiv.1503.02531 - The seminal paper that introduced the concept of knowledge distillation, detailing the use of soft targets and temperature scaling for training a smaller student model from a larger teacher.
Knowledge Distillation: A Survey, Jianping Gou, Baosheng Yu, Stephen J. Maybank, Dacheng Tao, 2021International Journal of Computer Vision, Vol. 129 (Springer Nature)DOI: 10.1007/s11263-021-01459-y - A comprehensive survey providing an overview of various knowledge distillation methods, their applications, and recent advancements, useful for a deeper understanding of the field.