Monitoring Drift in Embeddings and Unstructured Data
Was this section helpful?
A Kernel Method for the Two-Sample Problem, Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, Alexander Smola, 2012Journal of Machine Learning Research, Vol. 13 (JMLR)DOI: 10.1162/jmlr.2012.13.1.723 - This paper formally introduces Maximum Mean Discrepancy (MMD) as a non-parametric two-sample test, a fundamental tool for comparing high-dimensional distributions.
Sliced Wasserstein Distance for Learning Models, Nicolas Bonneel, Mathieu Coeurjolly, Jean-Denis Durou, Florian Le Cun, 2015ACM Transactions on Graphics (TOG), Vol. 34 (Association for Computing Machinery)DOI: 10.1145/2816795.2818139 - This paper explores the Sliced Wasserstein Distance, an efficient approximation of the Earth Mover's Distance, making it more computationally feasible for high-dimensional data like embeddings.