Singular Value Decomposition (SVD) is a foundational linear algebra technique used for matrix factorization. It provides a principled way to break down any matrix into a product of three other matrices, revealing its underlying structure and facilitating the discovery of latent factors.
Formally, SVD states that any rectangular matrix of size (representing users and items) can be decomposed into three matrices:
Let's examine each component:
The decomposition of the user-item matrix into three distinct matrices: , , and .
There is a significant challenge when applying this "pure" form of SVD directly to recommendation problems: it requires a complete matrix with no missing values. Our user-item interaction matrix, , is almost always sparse, meaning most entries are unknown. If we were to fill the missing entries with zeros, the algorithm would interpret those as actual ratings of 0, which would heavily skew the resulting factors.
Because of this, we don't use the classical SVD algorithm. Instead, we use algorithms that are inspired by SVD. These methods aim to find factor matrices that approximate the original matrix only for the ratings we know about. The goal is no longer to perfectly reconstruct , but to find the latent factor matrices (for users) and (for items) that best model the observed user-item interactions.
This modified approach is often still referred to as SVD in the recommender systems literature, though it's technically an approximation. The objective is to find and such that their product, , is a good approximation of .
The real power of SVD in recommendations comes from using it for dimensionality reduction. The singular values in the matrix are sorted by importance. The first singular value corresponds to the most significant pattern in the data, the second to the next most significant, and so on. Many of the later singular values are often small and can be treated as noise.
We can exploit this by keeping only the top latent factors, where is a number much smaller than the original number of users or items. This process is known as Truncated SVD. We reduce the dimensionality of our matrices:
Our new approximation of the ratings matrix, , is:
This has two major benefits:
The approximation of using truncated matrices, where is the number of latent factors. This reduces the dimensionality and captures the most significant patterns.
Once our model has learned the user-factor matrix (analogous to ) and the item-factor matrix (analogous to ), making a prediction is straightforward.
Each user is represented by a vector of length , and each item is represented by a vector of length . The predicted rating is simply the dot product of these two vectors:
This dot product measures the alignment between a user's preferences and an item's characteristics in the learned latent space. A user vector with high values for factors that also have high values in an item's vector will result in a high predicted rating.
The main task, then, is to find the values for the matrices and . Since we cannot use classical SVD on our sparse matrix, we must turn to other methods. We reframe the problem as an optimization task: find the latent factors that minimize the prediction error on the ratings we already know. The next section will cover how we can achieve this using an iterative optimization algorithm called Stochastic Gradient Descent.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with