BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, 2019Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Vol. Volume 1 (Association for Computational Linguistics)DOI: 10.18653/v1/N19-1423 - 介绍了BERT模型及其预训练目标(掩码语言建模和下一句预测),解释了[CLS]、[SEP]和[MASK]标记的目的和用法。
Neural Machine Translation of Rare Words with Subword Units, Rico Sennrich, Barry Haddow, and Alexandra Birch, 2016Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Association for Computational Linguistics)DOI: 10.18653/v1/P16-1162 - 介绍了用于子词分词的字节对编码(BPE),这是一种基础算法,它减少了OOV率并帮助管理词汇量,为特殊标记添加结构信息的必要性奠定了基础。
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, Jeffrey Dean, 2016arXiv preprint arXiv:1609.08144DOI: 10.48550/arXiv.1609.08144 - 介绍了WordPiece分词,这是一种在BERT等模型中使用的替代子词分词方法,它与BPE互补,用于管理词汇和稀有词。