Normalized Discounted Cumulative Gain (NDCG)

Evaluating recommendation systems often requires metrics that go beyond simple presence of relevant items. Many evaluation approaches encounter challenges when accounting for the relative importance of item position within a recommended list. For instance, an item ranked at position 1 is typically more valuable than one at position 10, yet some methods treat them equally. Furthermore, accurately assessing relevance means recognizing varying degrees of usefulness, rather than just a binary 'relevant' or 'not relevant' distinction.

In many real applications, relevance is not binary. A user might love one movie, like another, and find a third one merely acceptable. Normalized Discounted Cumulative Gain (NDCG) is a powerful ranking metric designed specifically for these scenarios. It evaluates the quality of a recommended list by incorporating two important ideas:

Highly relevant items are more valuable than marginally relevant ones.
Relevant items that appear earlier in the list are more useful than those that appear later.

To understand NDCG, we will build it up from its components: Cumulative Gain, Discounted Cumulative Gain, and finally, the normalized version.

From Cumulative Gain to DCG

Let's start with the simplest form, Cumulative Gain (CG). CG is the sum of the relevance scores of the items in the recommended list up to a certain rank $k$ . It ignores the order of the items, only their relevance.

The formula for CG at position $k$ is:

CG_k = \sum_{i=1}^{k} rel_i

Here, $rel_i$ is the relevance score of the item at position $i$ . For example, if we have a top-3 list with relevance scores [5, 2, 3], the $CG_3$ is simply $5 + 2 + 3 = 10$ . This tells us the total relevance we have accumulated, but it fails to reward the model for placing the most relevant item (score 5) at the top.

To fix this, we introduce a penalty for placing relevant items lower in the list. This brings us to Discounted Cumulative Gain (DCG). DCG systematically discounts the relevance score of an item based on its rank. The most common way to do this is with a logarithmic discount.

The formula for DCG at position $k$ is:

DCG_k = \sum_{i=1}^{k} \frac{rel_i}{\log_2(i+1)}

The term $\log_2(i+1)$ in the denominator increases as the position $i$ increases. This means items at higher ranks (like position 1 or 2) have their relevance scores divided by a small number, while items at lower ranks are divided by a larger number, effectively "discounting" their contribution to the total score.

The discount factor applied to an item's relevance score decreases as its rank position increases. The impact of an item at position 10 is less than a third of an item at position 1.

Let's revisit our example list with relevance scores [5, 2, 3].

Position 1: $rel_1 = 5$ . Discount factor: $\frac{1}{\log_2(1+1)} = 1$ . Contribution: $5 \times 1 = 5$ .
Position 2: $rel_2 = 2$ . Discount factor: $\frac{1}{\log_2(2+1)} \approx 0.631$ . Contribution: $2 \times 0.631 = 1.262$ .
Position 3: $rel_3 = 3$ . Discount factor: $\frac{1}{\log_2(3+1)} = 0.5$ . Contribution: $3 \times 0.5 = 1.5$ .

The $DCG_3$ is the sum of these contributions: $5 + 1.262 + 1.5 = 7.762$ .

Now, what happens if our model produced a worse ranking, [2, 3, 5]:

Position 1: $rel_1 = 2$ . Contribution: $2 \times 1 = 2$ .
Position 2: $rel_2 = 3$ . Contribution: $3 \times 0.631 = 1.893$ .
Position 3: $rel_3 = 5$ . Contribution: $5 \times 0.5 = 2.5$ .

The $DCG_3$ for this list is $2 + 1.893 + 2.5 = 6.393$ . As you can see, the DCG score is lower, correctly penalizing the model for placing the most relevant item at the bottom of the list.

Normalizing DCG to Get NDCG

DCG is a good position-aware metric, but it has one remaining problem: its value is not easily interpretable. The maximum possible DCG depends on the specific user and the available relevant items. A user with ten highly relevant items will have a much higher potential DCG than a user with only three marginally relevant items. This makes it difficult to average scores across users or compare performance on different datasets.

The solution is to normalize the DCG score. We do this by dividing the model's DCG by the Ideal Discounted Cumulative Gain (IDCG). The IDCG is the DCG of a perfect, or "ideal," ranking. To calculate it, you take all the relevant items for a user, sort them in descending order of relevance, and compute the DCG for that perfect list.

The final formula for Normalized Discounted Cumulative Gain (NDCG) is:

NDCG_k = \frac{DCG_k}{IDCG_k}

The resulting NDCG score is always a value between 0.0 and 1.0. An NDCG of 1.0 means the model's ranking is perfect, while a score of 0.0 means none of the recommended items are relevant.

Let's complete our example. The relevance scores for the user are [5, 2, 3].

Model's Ranking: [5, 2, 3]
- $DCG_3 = 7.762$ (as calculated before).
Ideal Ranking: First, sort the relevance scores in descending order: [5, 3, 2]. This is the perfect ranking.
- Calculate $IDCG_3$ $I D C G_{3}$ for this ideal list:
  - Position 1 ( $rel=5$ ): $5 / \log_2(2) = 5.0$
  - Position 2 ( $rel=3$ ): $3 / \log_2(3) \approx 1.893$
  - Position 3 ( $rel=2$ ): $2 / \log_2(4) = 1.0$
- $IDCG_3 = 5.0 + 1.893 + 1.0 = 7.893$ .
Calculate NDCG:
- $NDCG_3 = \frac{DCG_3}{IDCG_3} = \frac{7.762}{7.893} \approx 0.983$ .

This score is very close to 1.0, indicating that the model's ranking was nearly perfect.

When to Use NDCG

NDCG is one of the most informative offline metrics for evaluating ranked lists. You should use it when:

The order of recommendations is important. If getting the top 3 items right is more significant than the items from positions 8 to 10, NDCG is a great choice.
You can assign different relevance levels to items. This is straightforward with explicit feedback (e.g., 1-5 star ratings). With implicit feedback, you might assign higher relevance to a "purchase" event than a "click" event.

By capturing both relevance and position, NDCG provides a more complete picture of your recommender's performance than simpler metrics. It has become a standard metric in information retrieval and is an essential tool for anyone serious about optimizing a recommendation system.

Was this section helpful?

References

Cumulated gain-based evaluation of IR techniques, Kalervo Järvelin, Jaana Kekäläinen, 2002 ACM Transactions on Information Systems (TOIS), Vol. 20 (ACM) DOI: 10.1145/582415.582418 - This paper introduces Discounted Cumulative Gain (DCG) and Normalized Discounted Cumulative Gain (NDCG) as key metrics for evaluating information retrieval systems, laying the groundwork for their widespread adoption.
Introduction to Information Retrieval, C.D. Manning, P. Raghavan, H. Schütze, 2008 (Cambridge University Press) - A classic textbook in information retrieval, providing a thorough explanation of ranking evaluation metrics, including NDCG, within the context of search and relevance assessment. Chapter 8 is particularly relevant.
Recommender Systems: An Introduction, Dietmar Jannach, Markus Zanker, Alexander Felfernig, Gerhard Friedrich, 2010 (Cambridge University Press) - This textbook offers an accessible introduction to the field of recommender systems, dedicating a section to evaluation metrics where NDCG is discussed in the context of recommendation quality.
Recommender Systems Handbook, Francesco Ricci, Lior Rokach, Bracha Shapira, 2022 (Springer US) DOI: 10.1007/978-1-0716-2197-4 - A comprehensive handbook on recommender systems, featuring contributions from leading researchers. It provides extensive coverage of evaluation methodologies and metrics, including NDCG, for assessing system performance.