Singular Value Decomposition provides a way to break down any matrix A into the product A=UΣVT. This decomposition isn't just mathematically elegant. It turns out to be extremely useful for simplifying data, particularly for dimensionality reduction.
The core idea relies on the diagonal matrix Σ. The diagonal entries of Σ, denoted σ1,σ2,…,σr (where r is the rank of A), are the singular values of A. By convention, these are sorted in descending order: σ1≥σ2≥⋯≥σr>0. These singular values measure the importance or "energy" captured along the directions defined by the corresponding columns of U (left singular vectors) and V (right singular vectors).
Dimensionality reduction using SVD works by keeping only the most significant parts of this decomposition. Specifically, we select the first k singular values (where k<r) and their associated singular vectors. We discard the remaining r−k singular values and vectors, which correspond to the directions where the data varies the least.
Mathematically, we form an approximation of A by truncating the matrices U, Σ, and V:
The lower-rank approximation of A is then given by:
Ak=UkΣkVkT
This matrix Ak has rank k and is the best rank-k approximation of the original matrix A in the sense that it minimizes the Frobenius norm of the difference ∥A−Ak∥F. This is a fundamental property often referred to as the Eckart-Young theorem, although understanding the outcome is more important than the name: SVD gives you the optimal way to compress your matrix A into a rank-k representation.
Think of the original m×n matrix A as representing m data points, each with n features (or vice versa). SVD finds new, orthogonal bases for the row space (via V) and column space (via U) of A. The transformation A essentially maps vectors from the row space basis to the column space basis, scaling them by the singular values.
By setting the smaller singular values to zero (effectively what truncating to Ak does), we project the original data onto a lower-dimensional subspace spanned by the principal directions associated with the largest singular values. We retain the most significant variations while discarding the less informative ones.
The choice of k, the number of dimensions to retain, is application-dependent. Common strategies include:
A typical plot showing sorted singular values decreasing rapidly, while the cumulative variance explained quickly approaches 100%. This helps in selecting a suitable value for k.
Dimensionality reduction using SVD is closely related to Principal Component Analysis (PCA). In fact, SVD provides a numerically stable way to compute the principal components of a dataset. If the data matrix A has its columns centered (mean subtracted), the right singular vectors (columns of V) correspond to the principal directions, and the squared singular values (σi2) are proportional to the variance captured by each principal component. Applying the truncated SVD Ak=UkΣkVkT effectively projects the data onto the first k principal components.
Using SVD for dimensionality reduction offers several advantages:
However, there are considerations:
numpy.linalg.svd
) and SciPy (scipy.linalg.svd
).Despite the computational cost of the decomposition itself, SVD remains a fundamental tool for reducing the complexity of datasets in many machine learning pipelines, enabling more efficient processing and sometimes leading to better model generalization by filtering out noise.
© 2025 ApX Machine Learning