Having explored the characteristics of several fundamental probability distributions, let's turn our attention to how we can practically work with them using Python. The SciPy library, specifically its stats
module (scipy.stats
), provides a comprehensive set of tools for interacting with a wide array of probability distributions. This capability is essential for statistical modeling, simulation, and various machine learning tasks.
The scipy.stats
module offers a consistent interface for many distributions, both continuous and discrete. For each distribution, you can typically perform several operations:
.pdf()
method..pmf()
method..cdf()
method..ppf()
method..rvs()
method.Let's see how this works with some examples.
The Normal (Gaussian) distribution is ubiquitous in statistics. In scipy.stats
, it's represented by norm
. To work with a specific Normal distribution, we often need to specify its mean (μ) using the loc
parameter and its standard deviation (σ) using the scale
parameter. Remember, the Normal distribution is often parameterized by variance σ2, but SciPy uses standard deviation for scale
.
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt # We'll use this for basic plotting setup
# Define a Normal distribution: mean=0, std_dev=2
mu = 0
sigma = 2
my_normal = norm(loc=mu, scale=sigma)
# Calculate the PDF at x=1
pdf_at_1 = my_normal.pdf(1)
print(f"PDF at x=1: {pdf_at_1:.4f}")
# Calculate the CDF at x=1 (P(X <= 1))
cdf_at_1 = my_normal.cdf(1)
print(f"CDF at x=1 (P(X <= 1)): {cdf_at_1:.4f}")
# Calculate the PPF for probability 0.95 (find the 95th percentile)
percentile_95 = my_normal.ppf(0.95)
print(f"95th Percentile (Value x such that P(X <= x) = 0.95): {percentile_95:.4f}")
# Generate 5 random samples from this distribution
random_samples = my_normal.rvs(size=5)
print(f"Five random samples: {random_samples}")
# Generate data points for plotting the PDF
x_values = np.linspace(mu - 4*sigma, mu + 4*sigma, 200) # Cover range around mean
pdf_values = my_normal.pdf(x_values)
{"data": [{"x": [-8.0, -7.758793969849246, -7.517587939698492, -7.276381909547738, -7.035175879396985, -6.793969849246231, -6.552763819095477, -6.311557788944723, -6.07035175879397, -5.829145728643216, -5.587939698492462, -5.346733668341708, -5.105527638190955, -4.864321608040201, -4.623115577889447, -4.381909547738693, -4.14070351758794, -3.899497487437186, -3.658291457286432, -3.417085427135678, -3.1758793969849246, -2.934673366834171, -2.693467336683417, -2.452261306532663, -2.2110552763819096, -1.9698492462311557, -1.728643216080402, -1.487437185929648, -1.2462311557788944, -1.0050251256281408, -0.7638190954773869, -0.5226130653266331, -0.2814070351758795, -0.04020100502512574, 0.2010050251256281, 0.4422110552763819, 0.6834170854271357, 0.9246231155778895, 1.1658291457286433, 1.407035175879397, 1.6482412060301508, 1.8894472361809045, 2.1306532663316583, 2.371859296482412, 2.613065326633166, 2.8542713567839197, 3.0954773869346735, 3.3366834170854273, 3.577889447236181, 3.8190954773869347, 4.0603015075376885, 4.301507537688442, 4.542713567839196, 4.7839195979899495, 5.025125628140703, 5.266331658291457, 5.507537688442211, 5.748743718592965, 5.989949748743719, 6.231155778894472, 6.472361809045226, 6.71356783919598, 6.954773869346733, 7.195979899497487, 7.437185929648241, 7.678391959798995, 7.919597989949749, 8.160804020100502, 8.402010050251256, 8.64321608040201, 8.884422110552764, 9.125628140703518, 9.366834170854271, 9.608040201005025, 9.849246231155778, 10.090452261306532, 10.331658291457286, 10.57286432160804, 10.814070351758794, 11.055276381909547, 11.296482412060302, 11.537688442211055, 11.77889447236181, 12.020100502512562, 12.261306532663316, 12.50251256281407, 12.743718592964824, 12.984924623115578, 13.226130653266332, 13.467336683417085, 13.70854271356784, 13.949748743718592, 14.190954773869346, 14.4321608040201, 14.673366834170854, 14.914572864321608, 15.155778894472362, 15.396984924623115, 15.63819095477387, 15.879396984924623, 16.120603015075377, 16.36180904522613, 16.603015075376885, 16.84422110552764, 17.08542713567839, 17.326633165829146, 17.5678391959799, 17.809045226130654, 18.050251256281408, 18.29145728643216, 18.532663316582915, 18.77386934673367, 19.015075376884422, 19.256281407035176, 19.49748743718593, 19.738693467336684, 20.0, -7.8, -7.6, -7.4, -7.2, -7.0, -6.8, -6.6, -6.4, -6.2, -6.0, -5.8, -5.6, -5.4, -5.2, -5.0, -4.8, -4.6, -4.4, -4.2, -4.0, -3.8, -3.6, -3.4, -3.2, -3.0, -2.8, -2.6, -2.4, -2.2, -2.0, -1.8, -1.6, -1.4, -1.2, -1.0, -0.8, -0.6, -0.4, -0.2, 0.0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, 3.6, 3.8, 4.0, 4.2, 4.4, 4.6, 4.8, 5.0, 5.2, 5.4, 5.6, 5.8, 6.0, 6.2, 6.4, 6.6, 6.8, 7.0, 7.2, 7.4, 7.6, 7.8, 8.0], "y": [0.00013383022576488537, 0.0001759014967798156, 0.00022823988875051606, 0.00029279769203687145, 0.000371818554933078, 0.0004680381640079861, 0.0005847051422635462, 0.0007257373404981457, 0.0008958903170238913, 0.0011008207779967572, 0.0013470317808217664, 0.001641989714263311, 0.001994231817924188, 0.002413375236668032, 0.0029098776328543216, 0.003495211317886446, 0.004181771130085962, 0.004982056033803556, 0.005908621688582039, 0.006974144817191269, 0.008189805229395886, 0.009564777696663993, 0.011099313789768684, 0.01279368099860198, 0.014647258274367513, 0.016658063712310647, 0.018821959069659206, 0.021132707779872515, 0.02358187120245829, 0.02615832067681272, 0.02884864842301983, 0.03163697217616692, 0.03450550507091801, 0.03743405147363891, 0.04039998531373013, 0.04337880679686323, 0.0463444123390895, 0.04926977469984758, 0.05212825628060989, 0.05489394112789744, 0.057542806174174016, 0.060053013765195014, 0.06240511335727876, 0.0645820507556373, 0.06657004181379912, 0.06835844895145466, 0.06994001272285412, 0.07131105184840346, 0.0724711672349892, 0.07342336711089007, 0.07417402774788603, 0.0747316406303934, 0.07510654068649911, 0.07530999968687376, 0.07535408855290848, 0.07525154084813266, 0.07501578104147766, 0.07466084886186207, 0.07420125757917537, 0.07365193581141505, 0.07302791716154781, 0.0723432330327074, 0.07161107462227648, 0.07084350615691188, 0.0700514170378232, 0.0692444850015572, 0.06843119166476231, 0.06761879917895585, 0.06681311243897448, 0.06601848368148343, 0.06523829108264863, 0.06447515893355892, 0.06373099464165009, 0.06300679488929716, 0.06230290404656007, 0.0616188408508934, 0.06095343232114011, 0.06030511648587468, 0.05967216986312402, 0.05905272948313066, 0.05844490027193402, 0.05784677351817719, 0.0572564424418418, 0.05667209164084985, 0.0560919913317038, 0.05551450016612514, 0.05493807247389326, 0.05436126046606986, 0.05378271494574954, 0.05320119053188055, 0.05261555461143586, 0.052024797255167946, 0.05142803822342974, 0.05082452701485345, 0.05021363357091132, 0.04959485900116856, 0.04896783786876547, 0.04833233752004915, 0.04768825353403392, 0.04703560298471233, 0.0463745269128601, 0.04570528877963544, 0.04502825357414316, 0.04434388973870187, 0.04365275912392017, 0.04295549604184982, 0.042252798142970184, 0.04154541329987137, 0.04083413137793851, 0.04011977525461593, 0.03940319223262282, 0.03868524422772766, 0.0379668081047238, 0.03724876636321977, 0.03653199881686657, 0.03581738128669636, 0.03510578454594757, 0.03439797437262052, 0.033694687478536345, 0.03299659763777444, 0.03230429791565826, 0.03161829107545403, 0.030938980487561087, 0.030266669709016174, 0.0296015525711155, 0.02894371271251552, 0.028293123025453196, 0.027649725899297518, 0.027013423216832075, 0.026384087546918248, 0.02576156326226085, 0.02514567754031812, 0.02453624304638488, 0.02393305838926273, 0.0233359184933291, 0.02274461575367458, 0.022158939722531385, 0.02157868120272202, 0.02100363717518596, 0.02043361608941151, 0.019868431676811936, 0.01930790376896397, 0.018751859325187165, 0.01820013350176841, 0.017652569848970125, 0.01710901976065249, 0.01656934272088167, 0.016033406125137045, 0.015501084445324562, 0.014972259322241517, 0.014446819572656474, 0.013924659999226477, 0.013405682491897156, 0.01288979598190656, 0.012376917208632812, 0.011866970306957354, 0.01135988594221281, 0.010855599810964911, 0.010354053260509745, 0.00985519282898657, 0.00935896907231553, 0.008865337539050223, 0.008374258912636307, 0.007885700004458484, 0.007399632791834368, 0.006916033540895829, 0.00643488272511303, 0.005956166004883689, 0.005479874056022348, 0.00500600270732644, 0.004534552885550587, 0.004065532596376765, 0.0035989558455154455, 0.003134842633234769, 0.002673218991892843, 0.0022141189800466433, 0.001757583690265983, 0.0013036621881308204], "type": "scatter", "mode": "lines", "name": "Normal PDF (mu=0, sigma=2)", "line": {"color": "#339af0"}}], "layout": {"title": {"text": "Normal Distribution PDF ($\mu=0, \sigma=2$)"}, "xaxis": {"title": {"text": "x"}}, "yaxis": {"title": {"text": "Probability Density"}}, "width": 600, "height": 400, "template": "plotly_white"}}
Probability Density Function (PDF) of a Normal distribution with mean 0 and standard deviation 2.
The Binomial distribution models the number of successes k in a fixed number n of independent Bernoulli trials, each with a probability of success p. In scipy.stats
, it's represented by binom
. We need to specify n and p.
from scipy.stats import binom
# Define a Binomial distribution: n=10 trials, p=0.5 probability of success
n_trials = 10
prob_success = 0.5
my_binomial = binom(n=n_trials, p=prob_success)
# Calculate the PMF for k=5 successes (P(X=5))
pmf_at_5 = my_binomial.pmf(5)
print(f"PMF at k=5 (P(X=5)): {pmf_at_5:.4f}")
# Calculate the CDF at k=5 (P(X <= 5))
cdf_at_5 = my_binomial.cdf(5)
print(f"CDF at k=5 (P(X <= 5)): {cdf_at_5:.4f}")
# Calculate the PPF for probability 0.9 (find k such that P(X <= k) >= 0.9)
# Note: For discrete distributions, PPF gives the smallest k satisfying the condition.
quantile_90 = my_binomial.ppf(0.9)
print(f"Value k such that P(X <= k) >= 0.9: {quantile_90}")
# Generate 10 random samples (number of successes in 10 trials)
random_samples_binom = my_binomial.rvs(size=10)
print(f"Ten random samples (number of successes): {random_samples_binom}")
# Generate data for plotting the PMF
k_values = np.arange(0, n_trials + 1)
pmf_values = my_binomial.pmf(k_values)
Probability Mass Function (PMF) of a Binomial distribution with n=10 trials and success probability p=0.5.
The pattern shown above applies similarly to other distributions available in scipy.stats
:
poisson(mu)
where mu
is the rate parameter λ. Methods include .pmf()
, .cdf()
, .ppf()
, .rvs()
.expon(scale=1/lambda)
where scale
corresponds to 1/λ, the inverse of the rate parameter λ. Alternatively, you can use loc
to shift the distribution. Methods include .pdf()
, .cdf()
, .ppf()
, .rvs()
.uniform(loc=a, scale=b-a)
for a uniform distribution over the interval [a,b). The loc
parameter defines the start point a, and scale
defines the width b−a. Methods include .pdf()
, .cdf()
, .ppf()
, .rvs()
.For instance, to find the probability of observing exactly 3 events for a Poisson distribution with an average rate (λ) of 4 events per interval:
from scipy.stats import poisson
lambda_rate = 4
my_poisson = poisson(mu=lambda_rate)
pmf_at_3 = my_poisson.pmf(3)
print(f"Poisson PMF at k=3 (lambda=4): {pmf_at_3:.4f}")
Being able to compute probabilities and generate samples from these standard distributions using SciPy is a fundamental skill. It allows you to simulate processes, test hypotheses (as we'll see later), and build components for more complex machine learning models that rely on probabilistic assumptions. Familiarity with the scipy.stats
interface for these common distributions will prove highly beneficial.
© 2025 ApX Machine Learning