In the previous sections, we learned about precision and recall as ways to measure classification performance. Precision tells us how many of the items we identified as positive are actually positive (TP/(TP+FP)). Recall tells us how many of the actual positive items we correctly identified (TP/(TP+FN)).
You might wonder, can we just maximize both? Ideally, yes. In practice, however, improving precision often leads to a decrease in recall, and vice versa. This inverse relationship is known as the precision-recall trade-off.
Many classification models don't just output a final class label (like "Spam" or "Not Spam"). Instead, they often output a probability or a confidence score indicating how likely an instance belongs to the positive class. A decision threshold is then used to convert this score into a final classification. For example, if the score is above 0.5, classify as "Spam"; otherwise, classify as "Not Spam".
This threshold is the key to understanding the trade-off. Let's see what happens when we adjust it:
Increasing the Threshold: If we make the model more confident before classifying an item as positive (e.g., require a score > 0.8), we will likely have fewer false positives (FP). Why? Because only items the model is very sure about will be classified as positive. This increases precision (TP/(TP+FP)) because the denominator (TP+FP) gets smaller (fewer FP). However, we might also miss more actual positive cases that had scores between 0.5 and 0.8 (more false negatives, FN). This decreases recall (TP/(TP+FN)) because the denominator (TP+FN) gets larger (more FN).
Decreasing the Threshold: If we make the model less confident before classifying an item as positive (e.g., require a score > 0.3), we will catch more of the actual positive cases, including those the model wasn't very sure about. This reduces the number of false negatives (FN) and therefore increases recall (TP/(TP+FN)). But, this leniency also means we are likely to incorrectly classify more negative items as positive (more false positives, FP). This increases the denominator of the precision formula (TP+FP), leading to lower precision.
Think of it like setting a net:
We can often visualize this relationship with a Precision-Recall curve. This curve plots precision (y-axis) against recall (x-axis) for different threshold values. Typically, as recall increases, precision decreases.
This plot shows a typical inverse relationship between precision and recall. Moving along the curve represents changing the decision threshold.
The "best" balance between precision and recall depends entirely on the specific problem and the costs associated with different types of errors:
Prioritize Precision when False Positives are costly:
Prioritize Recall when False Negatives are costly:
Understanding this trade-off is fundamental. It highlights that there's often no single "perfect" model according to all metrics simultaneously. Instead, you need to choose a model and potentially adjust its operating threshold based on the specific needs and consequences related to your application. The F1-score, which we discuss next, offers one way to find a balance, but knowing why you might lean towards precision or recall is essential context.
© 2025 ApX Machine Learning