Making Predictions with Weighted Averages

To estimate a rating for an item a user hasn't seen, particularly within recommendation systems using neighborhood-based collaborative filtering, one approach involves leveraging the behavior of similar users or items, commonly known as 'neighbors'. The goal is to predict how a user would rate an item they haven't seen before. This prediction is made by calculating a weighted average of the ratings from these neighbors. The core idea is simple: the opinions of more similar neighbors should have more influence on the prediction.

Let's examine how this works for both user-based and item-based approaches.

User-Based Prediction

In a user-based approach, we predict a user's rating for an item based on the ratings given to that same item by similar users. However, a simple average is often misleading because different users have different rating scales. One user might rate movies from 3 to 5 stars, while another uses the full 1-to-5 range.

To account for this, we use the deviation from each user's average rating. The prediction formula for user $u$ 's rating on item $i$ is:

P_{u,i} = \bar{r}_u + \frac{\sum_{v \in N} \text{sim}(u, v) \cdot (r_{v,i} - \bar{r}_v)}{\sum_{v \in N} |\text{sim}(u, v)|}

Let's break down this formula:

$P_{u,i}$ is the predicted rating for our target user $u$ on item $i$ .
$\bar{r}_u$ is the average rating given by the target user $u$ . We add this back at the end to return the prediction to their original rating scale.
$N$ is the neighborhood of users most similar to $u$ who have rated item $i$ .
$\text{sim}(u, v)$ is the similarity score between user $u$ and a neighbor user $v$ .
$r_{v,i} - \bar{r}_v$ is the neighbor $v$ 's rating for item $i$ , adjusted by subtracting their average rating $\bar{r}_v$ . This value represents how much better or worse than average the neighbor found the item.
The denominator, $\sum_{v \in N} |\text{sim}(u, v)|$ , is a normalization term that sums the absolute values of the similarity weights.

An Example Calculation

Imagine we want to predict how You will rate the movie Blade Runner 2049. Your average rating for all movies is 3.5. We find three similar users (your neighbors) who have also rated this movie.

User	Similarity to You	Rating for Blade Runner 2049	User's Average Rating
Alex	0.9	5.0	4.0
Ben	0.8	4.0	3.2
Chris	0.5	3.0	3.8

First, we calculate the weighted sum of the neighbors' adjusted ratings:

Alex: $0.9 \cdot (5.0 - 4.0) = 0.9 \cdot 1.0 = 0.9$
Ben: $0.8 \cdot (4.0 - 3.2) = 0.8 \cdot 0.8 = 0.64$
Chris: $0.5 \cdot (3.0 - 3.8) = 0.5 \cdot (-0.8) = -0.4$

The numerator is the sum of these values: $0.9 + 0.64 + (-0.4) = 1.14$ . The denominator is the sum of the absolute similarity scores: $|0.9| + |0.8| + |0.5| = 2.2$ .

Now, we plug these into the formula: $P_{\text{You}, \text{Blade Runner}} = 3.5 + \frac{1.14}{2.2} \approx 3.5 + 0.52 = 4.02$

Our model predicts you would rate Blade Runner 2049 approximately 4.02. This prediction is influenced more by Alex, your most similar neighbor, who loved the movie.

The prediction process for a user-based recommender. The opinions of more similar neighbors (like Alex) contribute more to the final predicted rating.

Item-Based Prediction

For an item-based approach, the logic is parallel but simpler. To predict user $u$ 's rating for item $i$ , we look at other items that user $u$ has already rated. We then take a weighted average of those ratings, where the weights are the similarities between item $i$ and the other items.

The formula is:

P_{u,i} = \frac{\sum_{j \in N} \text{sim}(i, j) \cdot r_{u,j}}{\sum_{j \in N} |\text{sim}(i, j)|}

Here's the breakdown:

$P_{u,i}$ is the predicted rating for user $u$ on the target item $i$ .
$N$ is the neighborhood of items most similar to item $i$ that user $u$ has already rated.
$\text{sim}(i, j)$ is the similarity between our target item $i$ and a neighbor item $j$ .
$r_{u,j}$ is the actual rating user $u$ gave to the neighbor item $j$ .

Notice that we don't need to adjust for user averages here. The entire calculation is based on the ratings from a single user ( $u$ ), so the rating scale is inherently consistent.

An Example Calculation

Let's stick with predicting your rating for Blade Runner 2049. This time, we use an item-based approach. We find three movies similar to Blade Runner 2049 that you have already rated.

Similar Movie (Item j)	Similarity to Blade Runner 2049	Your Rating for This Movie
Dune	0.95	5.0
Arrival	0.88	4.0
The Matrix	0.70	4.0

The weighted sum of your ratings (the numerator) is:

$(0.95 \cdot 5.0) + (0.88 \cdot 4.0) + (0.70 \cdot 4.0)$
$= 4.75 + 3.52 + 2.8 = 11.07$

The sum of the similarity weights (the denominator) is:

$|0.95| + |0.88| + |0.70| = 2.53$

The predicted rating is: $P_{\text{You}, \text{Blade Runner}} = \frac{11.07}{2.53} \approx 4.37$

The item-based model predicts you would rate the movie 4.37, based on your positive ratings of similar sci-fi films.

By applying these weighted average formulas, we can transform similarity scores into concrete, personalized rating predictions. These predictions form the basis for our recommendation lists, allowing us to rank unseen items and suggest the ones a user is most likely to enjoy.

Was this section helpful?

References

Item-Based Collaborative Filtering Recommendation Algorithms, Badrul Sarwar, George Karypis, Joseph Konstan, John Riedl, 2001 Proceedings of the 10th International Conference on World Wide Web (Association for Computing Machinery) DOI: 10.1145/371920.372071 - This foundational paper introduced the item-based collaborative filtering approach, providing the basis for its prediction formulas using weighted averages.
Recommender Systems: The Textbook, Charu C. Aggarwal, 2016 (Springer) DOI: 10.1007/978-3-319-29659-3 - A comprehensive academic book covering the theory and practice of recommender systems, including detailed explanations of both user-based and item-based collaborative filtering prediction methods.
Recommender Systems Handbook, Francesco Ricci, Lior Rokach, Bracha Shapira, 2022 (Springer US) DOI: 10.1007/978-1-0716-2197-4 - This handbook features contributions from leading researchers, offering in-depth coverage of collaborative filtering techniques and various prediction algorithms.