Building upon our understanding of sample spaces and events, let's consider how the probability of an event can change when we know that another event has already occurred. This leads us to the important concept of conditional probability.
Often, we are interested in the probability of an event A happening given that we know event B has happened. This is called the conditional probability of A given B, and it's denoted as P(A∣B). Think of it as updating our probability estimate based on new information (event B).
The core idea is that the occurrence of event B effectively reduces our sample space. We are no longer considering all possible outcomes in the original sample space S; instead, we are focusing only on the outcomes within event B. Within this reduced sample space, we want to find the probability of outcomes that also belong to event A. These are the outcomes in the intersection A∩B.
The formal definition of conditional probability is:
P(A∣B)=P(B)P(A∩B)This formula holds provided that P(B)>0 (we cannot condition on an event that has zero probability of occurring). P(A∩B) represents the probability that both A and B occur.
Imagine we're analyzing emails to classify them as spam or not spam (ham). Let S be the event that an email is spam, and let W be the event that an email contains the word "winner". Suppose we have the following probabilities from a large dataset:
What is the probability that an email is spam given that we know it contains the word "winner"? We want to calculate P(S∣W).
Using the formula:
P(S∣W)=P(W)P(S∩W)=0.100.08=0.8So, if we know an email contains the word "winner", the probability of it being spam increases significantly from the baseline P(S)=0.2 to P(S∣W)=0.8. This kind of calculation is fundamental in building spam filters.
We can visualize the restriction of the sample space using a diagram.
The diagram illustrates how conditioning on event W (emails containing "winner") restricts the focus to the blue area. The conditional probability P(S∣W) is the proportion of the intersection (red overlapping part) relative to the size of the conditioned space (blue area).
Now, what if knowing that event B occurred doesn't change the probability of event A at all? In such cases, we say that events A and B are independent.
Formally, two events A and B are independent if:
P(A∣B)=P(A)Assuming P(B)>0. Similarly, if P(A)>0, independence also means P(B∣A)=P(B).
If we substitute the definition of conditional probability into the independence condition P(A∣B)=P(A), we get:
P(B)P(A∩B)=P(A)Multiplying both sides by P(B) gives us a very useful alternative definition for independence:
Two events A and B are independent if and only if:
P(A∩B)=P(A)P(B)This formula is often the easiest way to check for independence if you know the probabilities of the individual events and their intersection. It also holds even if P(A) or P(B) is zero.
Understanding conditional probability and independence is fundamental for several reasons in machine learning:
Mastering how to calculate and interpret P(A∣B) and how to determine if events are independent forms a critical step towards understanding more complex statistical methods and machine learning algorithms. These concepts pave the way for understanding Bayes' Theorem, which provides a mechanism for reversing the direction of conditioning.
© 2025 ApX Machine Learning