Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30 (NIPS 2017), Vol. 30DOI: 10.48550/arXiv.1706.03762 - 介绍了Transformer架构和自注意力机制,对现代上下文嵌入模型至关重要。
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, 2019Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (Association for Computational Linguistics)DOI: 10.18653/v1/N19-1423 - 提出了BERT,一个通过对大量文本语料库进行预训练获得深度双向上下文表示的重要模型。
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, Nils Reimers, Iryna Gurevych, 2019Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (Association for Computational Linguistics)DOI: 10.18653/v1/D19-1410 - 描述了SBERT,这是一个BERT的改进版本,旨在生成具有语义意义的句子嵌入,以用于高效的相似性任务。