Text Data Representation: From Characters to Meaning
Was this section helpful?
Speech and Language Processing (3rd ed. draft), Daniel Jurafsky and James H. Martin, 2025 - An authoritative and comprehensive textbook for natural language processing, covering text preprocessing, tokenization, and various word representation methods.
GloVe: Global Vectors for Word Representation, Jeffrey Pennington, Richard Socher, Christopher Manning, 2014Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Association for Computational Linguistics)DOI: 10.3115/v1/D14-1162 - Presents GloVe, an unsupervised model for learning word embeddings based on global co-occurrence statistics, offering an alternative to Word2Vec.
Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, 2008 (Cambridge University Press) - A standard textbook for information retrieval, detailing fundamental text representation models like Bag-of-Words and TF-IDF.