While accuracy gives us a general sense of how often our model is right, it doesn't tell the whole story. Imagine an email spam filter. If only 1% of emails are actually spam, a model that always predicts "not spam" would be 99% accurate! That sounds great, but it completely fails at its primary task: catching spam. To get a better understanding, we need to break down the model's predictions into more specific categories.
For classification tasks, especially binary classification (where there are two possible outcomes, like "spam" vs. "not spam" or "disease" vs. "healthy"), we analyze predictions by comparing them to the actual, true values. We typically designate one class as "Positive" and the other as "Negative". Which class is positive is often determined by the focus of the problem (e.g., detecting spam, identifying a disease).
Based on this comparison, every prediction falls into one of four categories:
These are the cases where the model correctly predicts the positive class.
These are the cases where the model incorrectly predicts the positive class when the actual class is negative. This is sometimes called a "Type I Error".
These are the cases where the model correctly predicts the negative class.
These are the cases where the model incorrectly predicts the negative class when the actual class is positive. This is sometimes called a "Type II Error".
We can summarize these four possibilities in a table format, which forms the basis of the Confusion Matrix we will discuss next:
Predicted: Positive | Predicted: Negative | |
---|---|---|
Actual: Positive | True Positive (TP) | False Negative (FN) |
Actual: Negative | False Positive (FP) | True Negative (TN) |
Think about the total number of predictions your model makes. Every single prediction fits into one of these four boxes (TP, FP, TN, FN). Therefore, the total number of instances is:
Total=TP+FP+TN+FN
Accuracy, which we discussed earlier, can also be defined using these terms:
Accuracy=TP+FP+TN+FNTP+TN
This shows that accuracy simply measures the proportion of correct predictions (both positive and negative) out of all predictions made.
Breaking down predictions into these four categories is fundamental because different types of errors often have very different consequences. In our spam example:
In other scenarios, the cost imbalance can be much more severe. Consider a medical diagnosis model predicting a serious disease (Positive class = Disease):
Clearly, depending on the application, we might want to minimize one type of error more than the other. Understanding TP, FP, TN, and FN allows us to move beyond simple accuracy and choose metrics like Precision and Recall (discussed next) that better reflect the specific goals and constraints of our classification problem.
© 2025 ApX Machine Learning