Born Again Neural Networks, Tommaso Furlanello, Zachary Lipton, Michael Tschannen, Laurent Itti, Anima Anandkumar, 2018Proceedings of the 35th International Conference on Machine Learning, Vol. 80 (PMLR) - Presents self-distillation, where a model trains itself by using its own predictions from a previous state as a teacher, demonstrating performance improvements.
Unsupervised Data Augmentation for Consistency Training, Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, Quoc V. Le, 2020Advances in Neural Information Processing Systems (NeurIPS), Vol. 33 (Curran Associates Inc.)DOI: 10.48550/arXiv.1904.12848 - Explores data augmentation combined with consistency training, a strategy highly relevant for distillation-aware augmentation where the model learns invariances from augmented inputs.