Efficient Estimation of Word Representations in Vector Space, Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean, 2013arXiv preprint arXiv:1301.3781DOI: 10.48550/arXiv.1301.3781 - Introduces Word2Vec, a foundational method for learning dense word embeddings that capture semantic and syntactic relationships, illustrating the core idea of mapping words to vectors in a continuous space.
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30, Vol. 30 (Neural Information Processing Systems Foundation, Inc. (NeurIPS))DOI: 10.5555/3295222.3295349 - Presents the Transformer architecture, which has become a basis for many modern embedding models, allowing for the learning of contextual text representations.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, 2019Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (Association for Computational Linguistics)DOI: 10.18653/v1/N19-1423 - Introduces BERT, a significant advancement in pre-trained language models based on Transformers, important for generating context-aware embeddings.
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, Nils Reimers and Iryna Gurevych, 2019Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (Association for Computational Linguistics)DOI: 10.18653/v1/D19-1410 - Describes Sentence-BERT, a method for creating semantically meaningful sentence and document embeddings from pre-trained BERT-like models, optimized for tasks like semantic similarity search.