Neural Machine Translation of Rare Words with Subword Units, Rico Sennrich, Barry Haddow, Alexandra Birch, 2016Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Association for Computational Linguistics)DOI: 10.18653/v1/P16-1162 - 引入了字节对编码(BPE),一种广泛使用的子词分词方法,阐明了需要BPC等指标来公平比较使用不同分词方案的模型。