Singular Value Decomposition (SVD) stands out as one of the most versatile and informative matrix factorizations. Unlike eigen-decomposition, which we explored in the previous chapter and primarily applies to square matrices, SVD works for any rectangular m×n matrix. This generality makes it incredibly useful across various machine learning tasks, from dimensionality reduction to recommendation systems.
The core idea of SVD is to decompose an arbitrary matrix A into the product of three other matrices:
A=UΣVT
Let's break down these components:
U: An m×m orthogonal matrix. Its columns, u1,u2,...,um, are called the left singular vectors of A. Being orthogonal means its columns are orthonormal (unit length and mutually perpendicular), and UTU=UUT=Im, where Im is the m×m identity matrix. These vectors form an orthonormal basis for the column space of A (and its orthogonal complement).
Σ (Sigma): An m×n rectangular diagonal matrix. This is the matrix that holds the singular values of A, denoted by σi. The diagonal entries Σii=σi are non-negative real numbers, typically arranged in descending order (σ1≥σ2≥...≥σr>0), where r is the rank of matrix A. All off-diagonal entries of Σ are zero. If m>n, the bottom m−n rows consist entirely of zeros. If n>m, the rightmost n−m columns consist entirely of zeros. The singular values represent the "importance" or magnitude along the principal directions defined by the singular vectors.
VT: An n×n orthogonal matrix (the transpose of V). The columns of V (or rows of VT), v1,v2,...,vn, are the right singular vectors of A. Like U, V is orthogonal, meaning VTV=VVT=In. These vectors form an orthonormal basis for the row space of A (and its orthogonal complement, the null space).
The SVD decomposes an m×n matrix A into an m×m orthogonal matrix U, an m×n diagonal matrix Σ containing singular values, and the transpose of an n×n orthogonal matrix V.
The existence of SVD for any matrix is a fundamental result in linear algebra. It essentially tells us that any linear transformation represented by a matrix A can be thought of as a sequence of three simpler operations: a rotation or reflection (VT), followed by a scaling along orthogonal axes (multiplication by Σ), followed by another rotation or reflection (U).
While we won't go into the full proof of existence or the detailed algorithms for computation here (as libraries like NumPy handle this efficiently), understanding this structure is essential. In the following sections, we will explore the geometric meaning behind this decomposition and see how it leads to powerful applications like dimensionality reduction and data compression.
© 2025 ApX Machine Learning