GloVe: Global Vectors for Word Representation, Jeffrey Pennington, Richard Socher, Christopher Manning, 2014Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Association for Computational Linguistics)DOI: 10.3115/v1/D14-1162 - Presents the GloVe model, which combines global matrix factorization and local context window methods for word representation learning.
Enriching Word Vectors with Subword Information, Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov, 2017Transactions of the Association for Computational Linguistics (TACL), Vol. 5 (MIT Press)DOI: 10.1162/tacl_a_00051 - Describes fastText, an extension of Word2Vec that incorporates subword (character n-gram) information, allowing for better handling of rare and out-of-vocabulary words.
Speech and Language Processing (3rd ed. draft), Daniel Jurafsky, James H. Martin, 2025 (Stanford University) - A comprehensive textbook covering various NLP topics, with detailed chapters on word embeddings, their history, and applications. Specifically relevant is Chapter 6, "Vector Semantics and Embeddings".