You've learned about precision and recall, two important metrics for evaluating classification models. Precision tells you how many of the positive predictions your model made were actually correct, while recall tells you how many of the actual positive cases your model managed to identify. As we discussed in the previous section, there's often a trade-off: improving precision can sometimes lower recall, and vice versa.
So, what if you need a single number that summarizes both? What if both finding all the relevant items (high recall) and ensuring the items found are relevant (high precision) are important for your application? This is where the F1-score comes in handy.
The F1-score is a way to combine precision and recall into a single metric. It's calculated as the harmonic mean of precision and recall.
You might wonder, why not just take a simple average (arithmetic mean) of precision and recall? The harmonic mean has a useful property: it gives a lower weight to models where one metric is very high and the other is very low. A simple average might look good even if one metric is poor, but the harmonic mean penalizes this imbalance.
Consider a model with:
The arithmetic mean would be (0.9+0.1)/2=0.5. This score doesn't seem too bad.
However, the F1-score (as we'll see how to calculate next) would be much lower, reflecting the poor recall. The harmonic mean pulls the combined score closer to the lower value, ensuring that a model must perform reasonably well on both precision and recall to achieve a high F1-score. It only scores high if both precision and recall are high.
The formula for the F1-score is:
F1=2×Precision+RecallPrecision×RecallLet's use our previous example (Precision = 0.9, Recall = 0.1):
F1=2×0.9+0.10.9×0.1=2×1.00.09=0.18Notice how the F1-score (0.18) is much lower than the arithmetic mean (0.5) and closer to the lower metric (Recall = 0.1). This highlights the model's weakness in recall, which the simple average masked.
You can also express the F1-score directly using True Positives (TP), False Positives (FP), and False Negatives (FN):
F1=2×TP+FP+FN2×TPThis formula is derived by substituting the definitions of precision (Precision=TP+FPTP) and recall (Recall=TP+FNTP) into the first F1-score equation.
Like precision and recall, the F1-score ranges from 0 to 1.
A higher F1-score indicates a better balance between precision and recall. It's a particularly useful metric when:
Let's revisit a confusion matrix:
Predicted Positive | Predicted Negative | |
---|---|---|
Actual Positive | TP = 80 | FN = 20 |
Actual Negative | FP = 10 | TN = 90 |
From this, we calculate:
Now, we calculate the F1-score:
F1=2×0.89+0.800.89×0.80=2×1.690.712≈2×0.421≈0.84An F1-score of 0.84 suggests a good balance between precision (0.89) and recall (0.80).
In summary, the F1-score provides a single, convenient metric that summarizes a classifier's performance by balancing precision and recall. It's particularly valuable when dealing with imbalanced classes or when both types of classification errors (false positives and false negatives) need to be minimized.
© 2025 ApX Machine Learning