The Bernoulli and Binomial distributions are two fundamental probability distributions for discrete outcomes. They model simple "yes/no" or "success/failure" scenarios, which are surprisingly common in data analysis and machine learning.The Bernoulli Distribution: One Trial, Two OutcomesImagine the simplest possible random experiment with only two outcomes: success or failure. A single coin flip (Heads or Tails), a single click on an ad (Clicked or Not Clicked), or a single email being classified (Spam or Not Spam). The Bernoulli distribution describes the probability of these single-trial events.A random variable $X$ follows a Bernoulli distribution if it can take only two values, typically represented as 1 (success) and 0 (failure). The distribution is defined by a single parameter, $p$, which represents the probability of success ($P(X=1)$). Consequently, the probability of failure is $P(X=0) = 1 - p$.Probability Mass Function (PMF):The PMF gives the probability of each possible outcome. For a Bernoulli random variable $X$, the PMF is:$$ P(X=k | p) = \begin{cases} p & \text{if } k=1 \text{ (success)} \ 1-p & \text{if } k=0 \text{ (failure)} \end{cases} $$This can be written more compactly as:$$ P(X=k | p) = p^k (1-p)^{1-k} \quad \text{for } k \in {0, 1} $$Properties:Expected Value: The average outcome you'd expect over many trials. For Bernoulli, $E[X] = p$.Variance: A measure of the spread or variability of the outcomes. For Bernoulli, $Var(X) = p(1-p)$.The Bernoulli distribution is the fundamental building block for the Binomial distribution.The Binomial Distribution: Multiple Independent TrialsNow, what if we repeat a Bernoulli trial multiple times under the same conditions and count the number of successes? For example, flipping a fair coin 10 times and counting the number of heads, or testing 20 manufactured parts and counting how many are defective (assuming the probability of being defective is constant for each part and the tests are independent). This scenario is modeled by the Binomial distribution.A random variable $X$ follows a Binomial distribution if it represents the total number of successes in $n$ independent and identical Bernoulli trials, where each trial has a probability of success $p$.The Binomial distribution is characterized by two parameters:$n$: The total number of independent trials.$p$: The probability of success on any single trial.We denote this as $X \sim Binomial(n, p)$.Probability Mass Function (PMF):The PMF gives the probability of observing exactly $k$ successes in $n$ trials.$$ P(X=k | n, p) = \binom{n}{k} p^k (1-p)^{n-k} \quad \text{for } k \in {0, 1, 2, ..., n} $$Where:$\binom{n}{k}$ (read "n choose k") is the binomial coefficient, representing the number of different ways to choose $k$ successes from $n$ trials. It's calculated as $\binom{n}{k} = \frac{n!}{k!(n-k)!}$, where $!$ denotes the factorial.$p^k$ is the probability of getting exactly $k$ successes.$(1-p)^{n-k}$ is the probability of getting exactly $n-k$ failures.Properties:Expected Value: The average number of successes in $n$ trials. For Binomial, $E[X] = np$. This makes intuitive sense: if you flip a coin 10 times ($n=10$) with a probability of heads $p=0.5$, you expect $10 \times 0.5 = 5$ heads.Variance: The measure of spread for the number of successes. For Binomial, $Var(X) = np(1-p)$.Example: Suppose we flip a biased coin ($p=0.6$ for heads) 5 times ($n=5$). What's the probability of getting exactly 3 heads ($k=3$)?Using the PMF: $P(X=3 | n=5, p=0.6) = \binom{5}{3} (0.6)^3 (1-0.6)^{5-3}$ $P(X=3) = \frac{5!}{3!(5-3)!} (0.6)^3 (0.4)^2$ $P(X=3) = \frac{120}{6 \times 2} (0.216) (0.16)$ $P(X=3) = 10 \times 0.216 \times 0.16 = 0.3456$So, there's approximately a 34.6% chance of getting exactly 3 heads in 5 flips.Working with Bernoulli and Binomial Distributions in Python (SciPy)The scipy.stats module provides convenient functions for working with these distributions.import numpy as np from scipy.stats import bernoulli, binom import matplotlib.pyplot as plt # --- Bernoulli Example --- p_success = 0.7 # Probability of success (e.g., click-through rate) # Create a Bernoulli distribution object rv_bern = bernoulli(p_success) # Probability Mass Function (PMF) print(f"Bernoulli PMF(k=1): {rv_bern.pmf(1):.4f}") # P(X=1) print(f"Bernoulli PMF(k=0): {rv_bern.pmf(0):.4f}") # P(X=0) # Expected Value and Variance print(f"Bernoulli E[X]: {rv_bern.mean():.4f}") print(f"Bernoulli Var(X): {rv_bern.var():.4f}") # Generate random samples print(f"Bernoulli samples (10): {rv_bern.rvs(size=10)}") print("-" * 30) # --- Binomial Example --- n_trials = 10 # Number of trials (e.g., 10 emails checked) p_success_bin = 0.2 # Probability of success in each trial (e.g., email being spam) # Create a Binomial distribution object rv_binom = binom(n_trials, p_success_bin) # Probability Mass Function (PMF) for k=3 successes k_successes = 3 print(f"Binomial PMF(k={k_successes}): {rv_binom.pmf(k_successes):.4f}") # P(X=3) # Cumulative Distribution Function (CDF) for k<=3 successes print(f"Binomial CDF(k<={k_successes}): {rv_binom.cdf(k_successes):.4f}") # P(X<=3) # Expected Value and Variance print(f"Binomial E[X]: {rv_binom.mean():.4f}") # np print(f"Binomial Var(X): {rv_binom.var():.4f}") # np(1-p) # Generate random samples (number of successes in 15 experiments of n_trials each) print(f"Binomial samples (15): {rv_binom.rvs(size=15)}") # --- Plotting Binomial PMF --- k_values = np.arange(0, n_trials + 1) pmf_values = rv_binom.pmf(k_values) # Generate Plotly JSON for the Binomial PMF{"layout": {"title": "Binomial PMF (n=10, p=0.2)", "xaxis": {"title": "Number of Successes (k)"}, "yaxis": {"title": "Probability P(X=k)"}, "bargap": 0.1}, "data": [{"type": "bar", "x": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], "y": [0.10737418240000003, 0.268435456, 0.301989888, 0.201326592, 0.088080384, 0.0264241152, 0.005505024, 0.000786432, 7.3728e-05, 4.096e-06, 1.024e-07], "marker": {"color": "#339af0"}}]}Binomial distribution probability mass function for $n=10$ trials and a success probability $p=0.2$. The most likely outcome is 2 successes.The Bernoulli distribution models the single event, while the Binomial distribution aggregates the results of multiple independent Bernoulli events. Understanding these is essential as they form the basis for analyzing binary outcomes, which are frequent in classification problems (spam/not spam, malignant/benign), A/B testing results, and many other areas relevant to machine learning.