While accuracy gives us a general sense of how often our classification model is correct, it doesn't tell the whole story, especially when dealing with uneven class distributions or when the consequences of different types of errors vary significantly. We need metrics that provide more specific insights. One such metric is precision.
Imagine you've built an email spam filter. Accuracy tells you the overall percentage of emails classified correctly (spam as spam, not spam as not spam). But you might be particularly interested in this question: Of all the emails that the filter put into the spam folder, how many were actually spam? You wouldn't want your filter to aggressively label important emails as spam. This focus on the correctness of positive predictions is exactly what precision measures.
Precision answers the question: Out of all the instances the model predicted to be positive, what fraction were actually positive?
It focuses on the predictions your model made for the positive class. Think of it as a measure of exactness or quality. A high precision score means that when your model predicts an instance belongs to the positive class, it is very likely correct.
To calculate precision, we use the values from the confusion matrix, specifically True Positives (TP) and False Positives (FP):
The formula for precision is:
Precision=TP+FPTPNotice the denominator (TP+FP) represents the total number of instances your model predicted as positive. Precision is the ratio of the correctly predicted positive instances (TP) to the total number predicted as positive.
The components used to calculate precision. It focuses solely on the instances classified as positive by the model (TP+FP).
Let's return to our spam filter example. Suppose after testing the filter on 1000 emails, we get the following confusion matrix:
Predicted: Spam | Predicted: Not Spam | Total Actual | |
---|---|---|---|
Actual: Spam | TP = 95 | FN = 5 | 100 |
Actual: Not Spam | FP = 10 | TN = 890 | 900 |
Total Predicted | 105 | 895 | 1000 |
To calculate precision, we need TP and FP:
Now, apply the formula:
Precision=TP+FPTP=95+1095=10595≈0.905So, the precision of our spam filter is approximately 0.905 or 90.5%. This means that when the filter marks an email as spam, it is correct about 90.5% of the time.
High precision is particularly desirable when the cost of a False Positive (FP) is high. Consider these scenarios:
In these cases, we want to minimize False Positives, which translates to maximizing precision.
Precision gives us valuable information about the reliability of positive predictions, but it doesn't consider False Negatives (FN), the positive instances that the model incorrectly classified as negative. In our spam example, FN=5, meaning 5 actual spam emails slipped through the filter into the inbox. If minimizing these missed positive instances is important, we need to look at another metric: Recall. We'll examine Recall in the next section.
© 2025 ApX Machine Learning