Now that we understand what a Probability Mass Function (PMF) is for discrete random variables, let's look at the simplest possible discrete probability distribution: the Bernoulli distribution.
Imagine an experiment that has only two possible outcomes. Think of flipping a coin (Heads or Tails), checking if an email is spam (Spam or Not Spam), or a user clicking an ad (Click or No Click). These are examples of Bernoulli trials.
A Bernoulli trial is a single random experiment with exactly two mutually exclusive outcomes, typically labeled as "success" and "failure".
The Bernoulli distribution models the probability of the outcome of a single Bernoulli trial. It depends on just one parameter:
Since there are only two outcomes, the probability of "failure" must be 1−p.
Let's define a random variable X that represents the outcome of a Bernoulli trial. Conventionally, we set:
The Probability Mass Function (PMF) for a Bernoulli random variable X is straightforward:
P(X=1)=p P(X=0)=1−p
This specifies the probability for each of the two possible values that X can take. Sometimes, you might see this written more compactly using a single formula:
P(X=k)=pk(1−p)1−kfor k∈{0,1}
Let's check if this compact formula works. If k=1 (success), the formula gives p1(1−p)1−1=p1(1−p)0=p×1=p. Correct. If k=0 (failure), the formula gives p0(1−p)1−0=p0(1−p)1=1×(1−p)=1−p. Correct.
Since there are only two outcomes, the visualization is simple. It's a bar chart with two bars. Let's visualize a Bernoulli distribution where the probability of success (p) is 0.7:
PMF for a Bernoulli distribution with p=0.7. The probability of failure (X=0) is 1−p=0.3, and the probability of success (X=1) is p=0.7.
The Bernoulli distribution is fundamental because it's the building block for more complex distributions that deal with multiple trials, such as the Binomial distribution which we'll look at next. In machine learning, it often appears when modeling binary outcomes, like in logistic regression where the model predicts the probability of a positive class (a "success"). Understanding this simple distribution provides a solid base for exploring more involved probability concepts.
© 2025 ApX Machine Learning