Okay, let's continue our exploration of probability. We've just seen how conditional probability, P(A∣B), helps us understand the likelihood of an event A happening given that event B has already occurred. Now, we'll look at a famous and incredibly useful result that builds directly on this idea: Bayes' Theorem.
Updating Beliefs with Bayes' Theorem
Imagine you have an initial belief about something (a hypothesis), and then you receive some new data or evidence. How should that new evidence change your belief? Bayes' Theorem provides a formal way to do exactly this: update your beliefs in light of new evidence.
Think about a common scenario: a medical test.
- Let D be the event that a person has a particular disease.
- Let T be the event that the person tests positive for the disease.
We often know information like:
- P(T∣D): The probability of testing positive if you actually have the disease (the test's sensitivity).
- P(T∣not D): The probability of testing positive even if you don't have the disease (a false positive).
- P(D): The overall probability that a person in the population has the disease before taking the test (the base rate or prevalence). This is our initial belief, or prior probability.
But what we usually want to know after getting a test result is:
- P(D∣T): The probability that you actually have the disease given that you tested positive.
Notice the difference? We often know P(Evidence∣Hypothesis) but want P(Hypothesis∣Evidence). Bayes' Theorem lets us calculate this "flipped" conditional probability.
The Formula
Bayes' Theorem is stated as:
P(A∣B)=P(B)P(B∣A)×P(A)
Let's break down each part using our medical test example (where A=D and B=T):
P(D∣T)=P(T)P(T∣D)×P(D)
- P(D∣T): This is the Posterior Probability. It's the updated probability of having the disease (D) after considering the evidence (testing positive, T). This is what we want to calculate.
- P(T∣D): This is the Likelihood. It's the probability of observing the evidence (positive test, T) given that the hypothesis (having the disease, D) is true. This is often known from test specifications (sensitivity).
- P(D): This is the Prior Probability. It's our initial belief about the hypothesis (having the disease, D) before seeing the evidence. This is the prevalence of the disease in the population.
- P(T): This is the Probability of the Evidence. It's the overall probability of testing positive (T), regardless of whether the person has the disease or not. It acts as a normalizing constant to ensure the posterior probability is a valid probability (between 0 and 1).
Calculating the Probability of the Evidence P(B)
How do we find P(T), the overall probability of testing positive? A person can test positive in two ways: they have the disease and test positive, OR they don't have the disease and test positive (a false positive). We use the law of total probability:
P(T)=P(T∩D)+P(T∩not D)
Using the definition of conditional probability (P(A∩B)=P(A∣B)P(B)), we can rewrite this as:
P(T)=P(T∣D)P(D)+P(T∣not D)P(not D)
So, the full Bayes' Theorem formula often looks like this:
P(D∣T)=P(T∣D)P(D)+P(T∣not D)P(not D)P(T∣D)P(D)
This looks more complex, but remember the denominator is just the sum of probabilities of all the ways the evidence (positive test) could have happened.
A Simple Example Calculation
Let's put some numbers to our medical test example:
- Suppose a disease affects 1% of the population. So, P(D)=0.01. This implies P(not D)=1−0.01=0.99.
- The test correctly identifies 95% of people who have the disease. So, P(T∣D)=0.95. (Sensitivity)
- The test incorrectly indicates a positive result for 5% of people who don't have the disease. So, P(T∣not D)=0.05. (False Positive Rate)
Now, someone tests positive. What's the probability they actually have the disease, P(D∣T)?
Let's use the formula:
- Numerator: P(T∣D)P(D)=0.95×0.01=0.0095
- Denominator:
- P(T∣D)P(D)=0.95×0.01=0.0095 (True Positives)
- P(T∣not D)P(not D)=0.05×0.99=0.0495 (False Positives)
- P(T)=0.0095+0.0495=0.0590
- Posterior Probability:
P(D∣T)=0.05900.0095≈0.161
So, even with a positive test result, the probability of actually having the disease is only about 16.1%! This might seem surprisingly low, but it makes sense when you consider the low prior probability (only 1% have the disease) and the possibility of false positives. The evidence (positive test) did increase our belief significantly (from 1% prior to 16.1% posterior), but the chance of a false positive is still substantial compared to the chance of a true positive in this scenario.
This diagram shows how the Prior Belief about having the disease, combined with the new Evidence of a positive test, leads to an updated Posterior Belief using Bayes' Theorem. The calculation incorporates the likelihood of the evidence under different scenarios.
Relevance to Machine Learning
Bayes' Theorem is more than just a formula; it's a fundamental concept for reasoning under uncertainty. In machine learning:
- Classification Models: Some classification algorithms (like Naive Bayes classifiers, which you might encounter later) are directly built on applying Bayes' Theorem. They calculate the probability of a class (like 'spam' or 'not spam') given the observed features (like words in an email).
- Updating Models: The core idea of updating beliefs based on data mirrors how many machine learning models learn. They start with some initial parameters (prior beliefs) and adjust them as they process more data (evidence) to get better parameters (posterior beliefs).
- Bayesian Methods: There's a whole branch of statistics and machine learning called Bayesian methods that uses this theorem extensively to model uncertainty in predictions and parameters.
While we won't get into complex Bayesian modeling in this introductory course, understanding the basic concept of Bayes' Theorem is helpful. It formalizes the intuitive process of adjusting your understanding as you gather more information, which is central to how both humans and machines learn from data.