Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze, 2008 (Cambridge University Press) - Covers the Vector Space Model, a fundamental framework for representing documents as vectors and calculating their similarity, central to information retrieval.
Speech and Language Processing (3rd edition draft), Daniel Jurafsky and James H. Martin, 2025 (Stanford University) - Provides an introduction to word embeddings, their semantic properties, and the application of cosine similarity for comparing text meanings.
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Trevor Hastie, Robert Tibshirani, and Jerome Friedman, 2009 (Springer) - Explains the mathematical foundations of various distance metrics, including Euclidean and Manhattan distances, within the context of statistical learning.
OpenAI Embeddings API, OpenAI, 2024 (OpenAI) - Official guide to OpenAI's embedding models, detailing their characteristics, normalization, and recommended use of cosine similarity for semantic comparisons.