Evaluating recommendation systems often requires metrics that go beyond simple presence of relevant items. Many evaluation approaches encounter challenges when accounting for the relative importance of item position within a recommended list. For instance, an item ranked at position 1 is typically more valuable than one at position 10, yet some methods treat them equally. Furthermore, accurately assessing relevance means recognizing varying degrees of usefulness, rather than just a binary 'relevant' or 'not relevant' distinction.
In many real applications, relevance is not binary. A user might love one movie, like another, and find a third one merely acceptable. Normalized Discounted Cumulative Gain (NDCG) is a powerful ranking metric designed specifically for these scenarios. It evaluates the quality of a recommended list by incorporating two important ideas:
To understand NDCG, we will build it up from its components: Cumulative Gain, Discounted Cumulative Gain, and finally, the normalized version.
Let's start with the simplest form, Cumulative Gain (CG). CG is the sum of the relevance scores of the items in the recommended list up to a certain rank . It does not consider the order of the items, only their relevance.
The formula for CG at position is:
Here, is the relevance score of the item at position . For example, if we have a top-3 list with relevance scores [5, 2, 3], the is simply . This tells us the total relevance we have accumulated, but it fails to reward the model for placing the most relevant item (score 5) at the top.
To fix this, we introduce a penalty for placing relevant items lower in the list. This brings us to Discounted Cumulative Gain (DCG). DCG systematically discounts the relevance score of an item based on its rank. The most common way to do this is with a logarithmic discount.
The formula for DCG at position is:
The term in the denominator increases as the position increases. This means items at higher ranks (like position 1 or 2) have their relevance scores divided by a small number, while items at lower ranks are divided by a larger number, effectively "discounting" their contribution to the total score.
The discount factor applied to an item's relevance score decreases as its rank position increases. The impact of an item at position 10 is less than a third of an item at position 1.
Let's revisit our example list with relevance scores [5, 2, 3].
The is the sum of these contributions: .
Now, consider what happens if our model produced a worse ranking, [2, 3, 5]:
The for this list is . As you can see, the DCG score is lower, correctly penalizing the model for placing the most relevant item at the bottom of the list.
DCG is a good position-aware metric, but it has one remaining problem: its value is not easily interpretable. The maximum possible DCG depends on the specific user and the available relevant items. A user with ten highly relevant items will have a much higher potential DCG than a user with only three marginally relevant items. This makes it difficult to average scores across users or compare performance on different datasets.
The solution is to normalize the DCG score. We do this by dividing the model's DCG by the Ideal Discounted Cumulative Gain (IDCG). The IDCG is the DCG of a perfect, or "ideal," ranking. To calculate it, you take all the relevant items for a user, sort them in descending order of relevance, and compute the DCG for that perfect list.
The final formula for Normalized Discounted Cumulative Gain (NDCG) is:
The resulting NDCG score is always a value between 0.0 and 1.0. An NDCG of 1.0 means the model's ranking is perfect, while a score of 0.0 means none of the recommended items are relevant.
Let's complete our example. The relevance scores for the user are [5, 2, 3].
Model's Ranking: [5, 2, 3]
Ideal Ranking: First, sort the relevance scores in descending order: [5, 3, 2]. This is the perfect ranking.
Calculate NDCG:
This score is very close to 1.0, indicating that the model's ranking was nearly perfect.
NDCG is one of the most informative offline metrics for evaluating ranked lists. You should use it when:
By capturing both relevance and position, NDCG provides a more complete picture of your recommender's performance than simpler metrics. It has become a standard metric in information retrieval and is an essential tool for anyone serious about optimizing a recommendation system.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with