Introduction to Matrix Factorization

Recommendation systems often encounter the problem of data sparsity, where users have rated few items in common. This sparsity makes it challenging for direct comparison methods to calculate meaningful similarity scores. Although such methods can be effective at identifying direct similarities, their performance often degrades in sparse datasets. Model-based approaches, particularly matrix factorization, overcome this limitation by moving beyond direct comparisons to instead identify latent features that explain user preferences and item characteristics.

The central idea is to describe both users and items using a set of shared, unobserved attributes known as latent factors. For a movie dataset, these factors might represent genres like "comedy" or "drama," the level of action, the presence of a particular director, or more abstract dimensions that the model learns automatically from the rating patterns. These factors are "latent" because they are not given to us in the dataset; the algorithm's job is to find them.

Matrix factorization accomplishes this by decomposing the large user-item interaction matrix ( $R$ ) into two much smaller matrices: a user-factor matrix ( $P$ ) and an item-factor matrix ( $Q$ ).

The user-factor matrix ( $P$ ) has one row for each user and $K$ columns, where $K$ is the number of latent factors we choose. Each row is a vector that represents a user's preferences along those $K$ dimensions.
The item-factor matrix ( $Q$ ) has $K$ rows and one column for each item. Each column is a vector that represents an item's characteristics along the same $K$ dimensions.

By multiplying a user's vector from $P$ with an item's vector from $Q$ , we can reconstruct an approximation of the original rating. This allows us to estimate a rating for any user-item pair, effectively "filling in the blanks" of our sparse interaction matrix.

The user-item matrix $R$ is approximated by the product of a user-factor matrix $P$ and an item-factor matrix $Q$ .

Predicting Ratings with Latent Factors

If we have $m$ users, $n$ items, and choose to find $K$ latent factors, our user-factor matrix $P$ will be of size $m \times K$ and our item-factor matrix $Q$ will be $K \times n$ . The predicted rating $\hat{r}_{ui}$ for user $u$ and item $i$ is calculated by taking the dot product of the user's vector $p_u$ (the $u$ -th row of $P$ ) and the item's vector $q_i$ (the $i$ -th column of $Q$ ):

\hat{r}_{ui} = p_u \cdot q_i = \sum_{k=1}^{K} p_{uk}q_{ki}

Let's illustrate with a simple two-factor example ( $K=2$ ) for movies. Imagine our model has learned the following latent factors:

Factor 1: A scale from "Serious Drama" (-1) to "Lighthearted Comedy" (+1).
Factor 2: A scale from "Action-Oriented" (-1) to "Character-Driven" (+1).

For example, take a user, Alex, and two movies, Movie A and Movie B. The algorithm might represent them with the following vectors:

Alex ( $p_{alex}$ ): [0.9, 0.8]. Alex strongly prefers lighthearted comedies and character-driven films.
Movie A ( $q_A$ ): [0.7, 0.6]. This movie is a fairly lighthearted, character-driven comedy.
Movie B ( $q_B$ ): [-0.8, -0.9]. This movie is a serious, action-oriented drama.

We can predict Alex's rating for each movie:

Prediction for Movie A: $\hat{r}_{alex, A} = (0.9 \times 0.7) + (0.8 \times 0.6) = 0.63 + 0.48 = 1.11$
Prediction for Movie B: $\hat{r}_{alex, B} = (0.9 \times -0.8) + (0.8 \times -0.9) = -0.72 - 0.72 = -1.44$

The high positive score for Movie A and the high negative score for Movie B align with our intuition. The model predicts that Alex will enjoy Movie A and dislike Movie B. The strength of matrix factorization is that the model learns these factor representations on its own, just by trying to find matrices $P$ and $Q$ whose product best reconstructs the known ratings in $R$ .

Advantages Over Neighborhood Methods

This approach offers several important benefits:

Compact Representation: Instead of needing the entire user-item matrix to make a prediction, we only need the user and item vectors. This is a form of dimensionality reduction that captures the most important information in a much smaller space.
Handles Sparsity: Because every user and item is described by a complete factor vector, we can compute a predicted rating for any user-item pair, even if the user has rated very few items or the item has received very few ratings. It generalizes better to unseen pairs.
Reveals Hidden Structure: The learned latent factors can sometimes provide interesting insights into the data, revealing the underlying dimensions that drive user preferences in a particular domain.

Of course, the main challenge is to find the optimal values for the matrices $P$ and $Q$ . The model must "learn" the factor vectors that minimize the difference between predicted ratings ( $\hat{r}_{ui}$ ) and actual, known ratings ( $r_{ui}$ ). The following sections will cover the algorithms used to perform this factorization and learning process.

Was this section helpful?

References

Matrix Factorization Techniques for Recommender Systems, Yehuda Koren, Robert Bell, and Chris Volinsky, 2009 Computer, Vol. 42 (IEEE) DOI: 10.1109/MC.2009.263 - This foundational paper introduces and details matrix factorization methods that achieved significant improvements in recommender systems, particularly as a key solution in the Netflix Prize competition.
Recommender Systems Handbook, Francesco Ricci, Lior Rokach, and Bracha Shapira, 2015 (Springer) DOI: 10.1007/978-1-4899-7637-6 - A comprehensive textbook providing in-depth coverage of various recommender system techniques, including dedicated chapters on matrix factorization models.
Mining of Massive Datasets, Jure Leskovec, Anand Rajaraman, Jeff Ullman, 2020 (Cambridge University Press) - Offers a clear, accessible explanation of collaborative filtering and matrix factorization within the context of large-scale data mining.
Lecture Notes on Machine Learning (CS229), Andrew Ng, 2018 - Provides a structured, pedagogical overview of collaborative filtering and matrix factorization, often used by students for foundational understanding.