Precision@k measures how many relevant items appear in top recommendations, but it has a significant blind spot: it ignores the order of those items. According to Precision@5, a recommendation list that places a relevant item at position 1 is scored identically to one that places it at position 5. This is not ideal in most applications. Users are far more likely to interact with items at the top of a list, so a good recommender should be rewarded for ranking relevant items higher.
This is precisely the problem that Mean Average Precision (MAP) is designed to address. It is a ranking metric that heavily penalizes models for placing relevant items further down the recommendation list.
To understand MAP, we first need to look at Average Precision (AP), which is calculated on a per-user basis. Average Precision for a single user is the average of the Precision@k values computed at each position k that contains a relevant item.
Let's walk through an example. Suppose our model generates a ranked list of 6 movie recommendations for a user. We also know the ground truth, which is the set of movies this user actually watched and liked from our test set.
[Movie C, Movie A, Movie F, Movie B, Movie H, Movie D]{Movie A, Movie B, Movie D}Now, we iterate through the recommended list and calculate the precision only at the positions where we find a relevant movie:
To get the Average Precision for this user, we average these precision scores. Since there were 3 relevant items in total, we divide the sum of our calculated precisions by 3.
Now, consider a better model that ranks the same relevant items higher.
[Movie A, Movie B, Movie C, Movie F, Movie D, Movie H]{Movie A, Movie B, Movie D}Let's calculate the AP for this new list:
Now, we find the average:
The AP score for Model 2 is significantly higher, correctly reflecting that it produced a better-ordered list of recommendations.
The following diagram illustrates the calculation for both models, showing how higher ranks for relevant items lead to a better AP score.
Comparing Average Precision for two different recommendation models for the same user. Model 2 achieves a higher score because it ranks the relevant items (A, B) higher on the list.
The formal definition for Average Precision is:
Where:
Average Precision gives us a score for a single user. To get a single metric that describes the performance of our entire model, we calculate the AP for every user in our test set and then take the average of all these scores. This final value is the Mean Average Precision (MAP).
Where:
A MAP score ranges from 0 to 1, where a higher value indicates a better model. A score of 1.0 would mean that the model perfectly ranked all relevant items at the very top of the list for every single user.
Let's translate the logic into a simple Python function. This function takes a list of recommended items and a set of relevant items and computes the AP.
import numpy as np
def average_precision(recommended_items, relevant_items):
"""
Calculates the Average Precision (AP) for a single recommendation list.
Args:
recommended_items (list): A ranked list of recommended item IDs.
relevant_items (set): A set of relevant item IDs (ground truth).
Returns:
float: The Average Precision score.
"""
if not relevant_items:
return 0.0
# Store precision values at each relevant position
precision_scores = []
num_hits = 0
for i, item_id in enumerate(recommended_items):
if item_id in relevant_items:
num_hits += 1
precision_at_k = num_hits / (i + 1)
precision_scores.append(precision_at_k)
if not precision_scores:
return 0.0
# AP is the mean of precision scores at relevant positions
return np.mean(precision_scores)
# Example from Model 2
recommended = ['Movie A', 'Movie B', 'Movie C', 'Movie F', 'Movie D', 'Movie H']
relevant = {'Movie A', 'Movie B', 'Movie D'}
ap_score = average_precision(recommended, relevant)
print(f"AP Score for Model 2: {ap_score:.4f}")
# Expected output: AP Score for Model 2: 0.8667
To get the MAP score for your system, you would run this function for each user in your test set and then compute the mean of all the returned AP scores. MAP is a standard and effective metric for any recommendation task where the ranking of items is a primary concern.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with