Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems, Vol. 30 (Curran Associates, Inc.) - 介绍了Transformer架构,该架构极大地影响了ASR模型,并描述了其相关的带有预热和衰减的学习率调度。
Curriculum Learning, Yoshua Bengio, Jérôme Louradour, Ronan Collobert, Jason Weston, 2009Proceedings of the 26th Annual International Conference on Machine Learning - ICML '09 (ACM Press)DOI: 10.1145/1553374.1553381 - 介绍了课程学习概念的基础论文,这是一种将训练样本从易到难排序的训练策略。