Recall from the chapter introduction that many problems in machine learning involve solving systems of linear equations, often represented in the compact form Ax=b where A is a matrix of coefficients (perhaps representing features or model structure), x is a vector of unknowns (like model parameters), and b is a vector of target values or outcomes.
One way to approach solving for x draws inspiration from basic algebra. If you have an equation like 5x=10, you solve for x by multiplying both sides by the reciprocal of 5, which is 1/5 or 5−1. This gives (5−1)5x=(5−1)10, simplifying to 1x=2, or x=2. The number 5−1 is the multiplicative inverse of 5 because 5×5−1=1.
Can we do something similar for matrix equations? We need the matrix equivalent of a reciprocal, something that "undoes" the effect of matrix A. This is called the matrix inverse.
For a given square matrix A, its inverse, denoted as A−1, is a matrix such that when multiplied by A (in either order), the result is the identity matrix I.
AA−1=A−1A=I
The identity matrix I (a square matrix with 1s on the main diagonal and 0s elsewhere) acts like the number 1 in matrix multiplication: AI=IA=A. So, the definition AA−1=I is directly analogous to a×a−1=1 in scalar arithmetic.
Important Note: The matrix inverse is defined only for square matrices (matrices with the same number of rows and columns, like n×n). However, not all square matrices have an inverse. Matrices that do have an inverse are called invertible or non-singular. Matrices that do not have an inverse are called non-invertible or singular. We'll see how to determine if a matrix is invertible in the section on determinants.
Let's look at a concrete example. Consider the matrix A: A=(2111) Its inverse is: A−1=(1−1−12) We can verify this by multiplying them: AA−1=(2111)(1−1−12)=((2)(1)+(1)(−1)(1)(1)+(1)(−1)(2)(−1)+(1)(2)(1)(−1)+(1)(2))=(1001)=I Multiplying in the other order, A−1A, also yields the identity matrix I.
Matrix inverses have several useful properties:
These properties are frequently used when manipulating matrix equations in machine learning derivations.
Thinking back to Chapter 2, where we saw matrices as linear transformations, the inverse matrix A−1 represents the transformation that reverses the transformation performed by A. If A rotates and scales space in a certain way, applying A−1 afterwards will rotate and scale it back to its original state. Applying A and then A−1 (or vice-versa) results in the identity transformation (represented by I), which leaves everything unchanged. This aligns with the algebraic definition AA−1=I.
The primary reason the matrix inverse is significant in this context is that it gives us a formal way to solve the linear system Ax=b. If A is invertible, we can multiply both sides of the equation on the left by A−1:
AxA−1(Ax)(A−1A)xIxx=b=A−1b=A−1b=A−1b=A−1bLeft-multiply by A−1Associativity of matrix multiplicationDefinition of inverse (A−1A=I)Property of identity matrix (Ix=x)This elegant result, x=A−1b, tells us that if we can find the inverse of the coefficient matrix A, we can find the solution vector x by simply multiplying A−1 by the vector b.
This provides a powerful conceptual tool for understanding solutions to linear systems. In the following sections, we will explore how to calculate the inverse and the determinant (which tells us if the inverse exists). We will also discuss why, despite the elegance of the formula x=A−1b, directly computing the inverse is often not the most numerically stable or efficient way to solve linear systems in practice, especially for large matrices encountered in machine learning.
© 2025 ApX Machine Learning