Alongside SVD and LU decomposition, another highly useful way to factorize a matrix is the QR decomposition. For an m×n matrix A (often with m≥n and linearly independent columns), the QR decomposition finds two specific matrices, Q and R, such that:
A=QR
Here, Q is an m×n matrix with orthonormal columns, and R is an n×n upper triangular matrix. Let's examine these components more closely.
The matrix Q has a special property: its columns form an orthonormal set. Recall from Chapter 4 that this means two things:
Combining these, we can state the defining property of a matrix Q with orthonormal columns concisely using matrix multiplication:
QTQ=I
where I is the n×n identity matrix.
Matrices satisfying this condition are called orthogonal matrices. (Technically, if Q were square, QTQ=I would also imply QQT=I, making QT the inverse of Q, QT=Q−1).
A significant geometric implication of orthogonal matrices is that multiplication by Q (or QT) preserves lengths and angles. Transformations represented by orthogonal matrices are rigid motions, like rotations or reflections. In the context of QR decomposition, the columns of Q form an orthonormal basis for the column space of the original matrix A. This process is conceptually similar to the Gram-Schmidt procedure, which takes a set of linearly independent vectors (the columns of A) and produces an orthonormal set (Q) that spans the same space.
The matrix R is an n×n upper triangular matrix. This means all entries below the main diagonal are zero:
R=r110⋮0r12r22⋮0⋯⋯⋱⋯r1nr2n⋮rnnThe matrix R essentially encodes the information about how the original columns of A are related to the orthonormal basis vectors in Q. Specifically, each column aj of A can be expressed as a linear combination of the first j columns of Q, with the coefficients coming from the j-th column of R.
The upper triangular structure of R is computationally advantageous, particularly when solving systems of linear equations, as we'll see next.
The QR decomposition factorizes matrix A into an orthogonal matrix Q (columns are orthonormal vectors) and an upper triangular matrix R.
One of the most important applications of QR decomposition is in solving linear least-squares problems. Recall that these problems often arise in machine learning when fitting models, like linear regression, where we want to find a vector x that minimizes the difference between Ax and a target vector b, i.e., minimize ∣∣Ax−b∣∣22.
The standard approach involves solving the normal equations:
ATAx=ATb
However, computing ATA can sometimes lead to numerical stability issues, especially if the columns of A are close to being linearly dependent. The matrix ATA might become ill-conditioned (have a large condition number), making the solution sensitive to small errors.
QR decomposition provides a more numerically stable alternative. If we substitute A=QR into the original least-squares objective (or directly into Ax=b for systems where a solution exists), we get:
QRx=b
Now, multiply both sides by QT:
QTQRx=QTb
Since QTQ=I, this simplifies beautifully to:
Rx=QTb
This gives us a system of linear equations with an upper triangular matrix R. Such systems are very easy and numerically stable to solve using a process called back-substitution. We first solve for the last component of x, then substitute that value back into the second-to-last equation to solve for the next component, and so on, working our way up.
Because this method avoids forming ATA, it generally has better numerical properties than using the normal equations directly, making it the preferred method in many numerical libraries.
While the conceptual idea behind QR decomposition can be understood through the Gram-Schmidt process, practical implementations typically use more numerically stable algorithms like Householder reflections or Givens rotations. These methods construct Q and R iteratively. Fortunately, you rarely need to implement these yourself; numerical libraries like NumPy and SciPy provide efficient and reliable functions to compute the QR decomposition, which we will utilize in the hands-on sections.
© 2025 ApX Machine Learning