While monitoring individual features for drift using univariate statistical tests like Kolmogorov-Smirnov provides a basic check, it often fails to capture the full picture. Production data rarely changes one feature at a time. More commonly, the relationships between features shift, altering the joint distribution of the data even if the marginal distributions of individual features appear stable. Relying solely on univariate tests can lead to a false sense of security, missing significant drifts that degrade model performance.
Consider a model predicting loan defaults based on income
and debt_level
. The individual distributions of income and debt might remain similar over time. However, if the correlation changes, perhaps higher-income individuals start taking on proportionally more debt than before, the model's understanding of risk based on the training data's correlations becomes outdated. This change in the joint distribution is multivariate drift, and detecting it requires techniques that look beyond single features.
Directly comparing high-dimensional probability distributions is computationally expensive and statistically challenging due to the "curse of dimensionality." As the number of features (d) increases, the volume of the feature space grows exponentially, making the data points increasingly sparse. This sparsity makes it difficult to estimate density accurately or apply traditional statistical tests reliably. Multivariate drift detection methods aim to overcome this by summarizing the high-dimensional distribution or focusing on specific aspects sensitive to change.
One approach is to use distance metrics that account for the correlation structure of the data. The Mahalanobis distance is a prominent example. Unlike Euclidean distance, which treats all dimensions equally, Mahalanobis distance measures the distance between a point and a distribution's center (mean), scaled by the data's covariance.
For a point x and a distribution with mean μ and covariance matrix Σ, the squared Mahalanobis distance is:
DM2(x,μ,Σ)=(x−μ)TΣ−1(x−μ)In the context of drift detection, we compare a target (production) dataset to a reference (training) dataset. We can compute the Mahalanobis distance of each point in the target dataset relative to the reference distribution's mean (μref) and covariance (Σref).
The distribution of these distances provides insight into drift. If the target data follows the same distribution as the reference data, the squared Mahalanobis distances should approximately follow a Chi-squared (χ2) distribution with d degrees of freedom (where d is the number of features), assuming the data is multivariate normal.
A common approach is:
Advantages:
Disadvantages:
The marginal distributions (projections onto Feature 1 or Feature 2 axes) for the Reference (blue) and Target (orange) datasets might appear similar. However, the correlation structure has clearly changed, indicating multivariate drift. A univariate test on each feature might miss this shift.
Another strategy is to first reduce the dimensionality of the data and then apply drift detection methods (including univariate ones) in the lower-dimensional space. The idea is that significant changes in the high-dimensional structure will manifest as changes in the lower-dimensional representation.
Principal Component Analysis (PCA) is a common choice.
Alternatively, monitor the principal components themselves. A significant change in the data distribution might alter the directions of maximum variance or the amount of variance explained by each component. Comparing the PCA eigenspectrum (eigenvalues) between the reference and target data can reveal such structural changes.
Advantages:
Disadvantages:
Comparing Covariance Matrices Directly: Methods exist to directly compare the reference covariance matrix Σref with a target covariance matrix Σtarget calculated on a window of recent data. This can involve calculating matrix distances (e.g., Frobenius norm ∣∣Σref−Σtarget∣∣F) or statistical tests based on likelihood ratios under assumptions like multivariate normality. This directly targets changes in the linear relationships between features.
Domain Classifiers (Adversarial Validation): As briefly mentioned in the chapter introduction, training a classification model to distinguish between reference data (label 0) and target data (label 1) is a powerful, model-agnostic technique. If the classifier achieves high accuracy (e.g., AUC significantly greater than 0.5), it indicates that the two datasets are distinguishable, meaning drift has occurred. The features the classifier relies on most heavily can also help diagnose the nature of the drift. This approach is explored in detail later in the "Using Adversarial Validation for Drift Assessment" section.
The best multivariate drift detection method depends on factors like:
In practice, you might employ multiple methods. For instance, using Mahalanobis distance for a quick check on overall distributional shift, supplemented by a domain classifier run periodically or when the distance metric flags potential drift, to get a more robust assessment and better interpretability. The hands-on exercise later in this chapter will provide practical experience implementing one of these techniques.
© 2025 ApX Machine Learning