Many problems in machine learning involve solving systems of linear equations, often represented in the compact form where is a matrix of coefficients (perhaps representing features or model structure), is a vector of unknowns (like model parameters), and is a vector of target values or outcomes.
One way to approach solving for draws inspiration from basic algebra. If you have an equation like , you solve for by multiplying both sides by the reciprocal of 5, which is or . This gives , simplifying to , or . The number is the multiplicative inverse of 5 because .
Can we do something similar for matrix equations? We need the matrix equivalent of a reciprocal, something that "undoes" the effect of matrix . This is called the matrix inverse.
For a given square matrix , its inverse, denoted as , is a matrix such that when multiplied by (in either order), the result is the identity matrix .
The identity matrix (a square matrix with 1s on the main diagonal and 0s elsewhere) acts like the number 1 in matrix multiplication: . So, the definition is directly analogous to in scalar arithmetic.
Important Note: The matrix inverse is defined only for square matrices (matrices with the same number of rows and columns, like ). However, not all square matrices have an inverse. Matrices that do have an inverse are called invertible or non-singular. Matrices that do not have an inverse are called non-invertible or singular. We'll see how to determine if a matrix is invertible in the section on determinants.
Let's look at a concrete example. Consider the matrix : Its inverse is: We can verify this by multiplying them: Multiplying in the other order, , also yields the identity matrix .
Matrix inverses have several useful properties:
These properties are frequently used when manipulating matrix equations in machine learning derivations.
Thinking back to Chapter 2, where we saw matrices as linear transformations, the inverse matrix represents the transformation that reverses the transformation performed by . If rotates and scales space in a certain way, applying afterwards will rotate and scale it back to its original state. Applying and then (or vice-versa) results in the identity transformation (represented by ), which leaves everything unchanged. This aligns with the algebraic definition .
The primary reason the matrix inverse is significant in this context is that it gives us a formal way to solve the linear system . If is invertible, we can multiply both sides of the equation on the left by :
This elegant result, , tells us that if we can find the inverse of the coefficient matrix , we can find the solution vector by simply multiplying by the vector .
This provides a powerful tool for understanding solutions to linear systems. In the following sections, we will explore how to calculate the inverse and the determinant (which tells us if the inverse exists). We will also discuss why, despite the elegance of the formula , directly computing the inverse is often not the most numerically stable or efficient way to solve linear systems in practice, especially for large matrices encountered in machine learning.
Cleaner syntax. Built-in debugging. Production-ready from day one.
Built for the AI systems behind ApX Machine Learning
Was this section helpful?
© 2026 ApX Machine LearningEngineered with