The SciPy library offers powerful tools for practically working with probability distributions in Python. Specifically, its stats module (scipy.stats) provides a comprehensive set of functions for interacting with a wide array of these distributions. This capability is essential for statistical modeling, simulation, and various machine learning tasks.The scipy.stats module offers a consistent interface for many distributions, both continuous and discrete. For each distribution, you can typically perform several operations:Probability Density Function (PDF): For continuous distributions, the PDF, often denoted as $f(x)$, gives the likelihood of a random variable taking on a specific value $x$. Use the .pdf() method.Probability Mass Function (PMF): For discrete distributions, the PMF, often denoted as $P(X=k)$, gives the probability that the discrete random variable $X$ is exactly equal to some value $k$. Use the .pmf() method.Cumulative Distribution Function (CDF): The CDF, $F(x) = P(X \le x)$, gives the probability that a random variable $X$ takes on a value less than or equal to $x$. Use the .cdf() method.Percent Point Function (PPF): Also known as the quantile function or the inverse CDF. Given a probability $p$, the PPF finds the value $x$ such that $F(x) = p$. Use the .ppf() method.Random Variates Sampling (RVS): Generates random numbers following the specified distribution. Use the .rvs() method.Let's see how this works with some examples.Working with Continuous Distributions: The Normal DistributionThe Normal (Gaussian) distribution is ubiquitous in statistics. In scipy.stats, it's represented by norm. To work with a specific Normal distribution, we often need to specify its mean ($\mu$) using the loc parameter and its standard deviation ($\sigma$) using the scale parameter. Remember, the Normal distribution is often parameterized by variance $\sigma^2$, but SciPy uses standard deviation for scale.import numpy as np from scipy.stats import norm import matplotlib.pyplot as plt # We'll use this for basic plotting setup # Define a Normal distribution: mean=0, std_dev=2 mu = 0 sigma = 2 my_normal = norm(loc=mu, scale=sigma) # Calculate the PDF at x=1 pdf_at_1 = my_normal.pdf(1) print(f"PDF at x=1: {pdf_at_1:.4f}") # Calculate the CDF at x=1 (P(X <= 1)) cdf_at_1 = my_normal.cdf(1) print(f"CDF at x=1 (P(X <= 1)): {cdf_at_1:.4f}") # Calculate the PPF for probability 0.95 (find the 95th percentile) percentile_95 = my_normal.ppf(0.95) print(f"95th Percentile (Value x such that P(X <= x) = 0.95): {percentile_95:.4f}") # Generate 5 random samples from this distribution random_samples = my_normal.rvs(size=5) print(f"Five random samples: {random_samples}") # Generate data points for plotting the PDF x_values = np.linspace(mu - 4*sigma, mu + 4*sigma, 200) # Cover range around mean pdf_values = my_normal.pdf(x_values) {"data": [{"x": [-8.0, -7.758793969849246, -7.517587939698492, -7.276381909547738, -7.035175879396985, -6.793969849246231, -6.552763819095477, -6.311557788944723, -6.07035175879397, -5.829145728643216, -5.587939698492462, -5.346733668341708, -5.105527638190955, -4.864321608040201, -4.623115577889447, -4.381909547738693, -4.14070351758794, -3.899497487437186, -3.658291457286432, -3.417085427135678, -3.1758793969849246, -2.934673366834171, -2.693467336683417, -2.452261306532663, -2.2110552763819096, -1.9698492462311557, -1.728643216080402, -1.487437185929648, -1.2462311557788944, -1.0050251256281408, -0.7638190954773869, -0.5226130653266331, -0.2814070351758795, -0.04020100502512574, 0.2010050251256281, 0.4422110552763819, 0.6834170854271357, 0.9246231155778895, 1.1658291457286433, 1.407035175879397, 1.6482412060301508, 1.8894472361809045, 2.1306532663316583, 2.371859296482412, 2.613065326633166, 2.8542713567839197, 3.0954773869346735, 3.3366834170854273, 3.577889447236181, 3.8190954773869347, 4.0603015075376885, 4.301507537688442, 4.542713567839196, 4.7839195979899495, 5.025125628140703, 5.266331658291457, 5.507537688442211, 5.748743718592965, 5.989949748743719, 6.231155778894472, 6.472361809045226, 6.71356783919598, 6.954773869346733, 7.195979899497487, 7.437185929648241, 7.678391959798995, 7.919597989949749, 8.160804020100502, 8.402010050251256, 8.64321608040201, 8.884422110552764, 9.125628140703518, 9.366834170854271, 9.608040201005025, 9.849246231155778, 10.090452261306532, 10.331658291457286, 10.57286432160804, 10.814070351758794, 11.055276381909547, 11.296482412060302, 11.537688442211055, 11.77889447236181, 12.020100502512562, 12.261306532663316, 12.50251256281407, 12.743718592964824, 12.984924623115578, 13.226130653266332, 13.467336683417085, 13.70854271356784, 13.949748743718592, 14.190954773869346, 14.4321608040201, 14.673366834170854, 14.914572864321608, 15.155778894472362, 15.396984924623115, 15.63819095477387, 15.879396984924623, 16.120603015075377, 16.36180904522613, 16.603015075376885, 16.84422110552764, 17.08542713567839, 17.326633165829146, 17.5678391959799, 17.809045226130654, 18.050251256281408, 18.29145728643216, 18.532663316582915, 18.77386934673367, 19.015075376884422, 19.256281407035176, 19.49748743718593, 19.738693467336684, 20.0], "y": [0.00013383022576488537, 0.0001759014967798156, 0.00022823988875051606, 0.00029279769203687145, 0.000371818554933078, 0.0004680381640079861, 0.0005847051422635462, 0.0007257373404981457, 0.0008958903170238913, 0.0011008207779967572, 0.0013470317808217664, 0.001641989714263311, 0.001994231817924188, 0.002413375236668032, 0.0029098776328543216, 0.003495211317886446, 0.004181771130085962, 0.004982056033803556, 0.005908621688582039, 0.006974144817191269, 0.008189805229395886, 0.009564777696663993, 0.011099313789768684, 0.01279368099860198, 0.014647258274367513, 0.016658063712310647, 0.018821959069659206, 0.021132707779872515, 0.02358187120245829, 0.02615832067681272, 0.02884864842301983, 0.03163697217616692, 0.03450550507091801, 0.03743405147363891, 0.04039998531373013, 0.04337880679686323, 0.0463444123390895, 0.04926977469984758, 0.05212825628060989, 0.05489394112789744, 0.057542806174174016, 0.060053013765195014, 0.06240511335727876, 0.0645820507556373, 0.06657004181379912, 0.06835844895145466, 0.06994001272285412, 0.07131105184840346, 0.0724711672349892, 0.07342336711089007, 0.07417402774788603, 0.0747316406303934, 0.07510654068649911, 0.07530999968687376, 0.07535408855290848, 0.07525154084813266, 0.07501578104147766, 0.07466084886186207, 0.07420125757917537, 0.07365193581141505, 0.07302791716154781, 0.0723432330327074, 0.07161107462227648, 0.07084350615691188, 0.0700514170378232, 0.0692444850015572, 0.06843119166476231, 0.06761879917895585, 0.06681311243897448, 0.06601848368148343, 0.06523829108264863, 0.06447515893355892, 0.06373099464165009, 0.06300679488929716, 0.06230290404656007, 0.0616188408508934, 0.06095343232114011, 0.06030511648587468, 0.05967216986312402, 0.05905272948313066, 0.05844490027193402, 0.05784677351817719, 0.0572564424418418, 0.05667209164084985, 0.0560919913317038, 0.05551450016612514, 0.05493807247389326, 0.05436126046606986, 0.05378271494574954, 0.05320119053188055, 0.05261555461143586, 0.052024797255167946, 0.05142803822342974, 0.05082452701485345, 0.05021363357091132, 0.04959485900116856, 0.04896783786876547, 0.04833233752004915, 0.04768825353403392, 0.04703560298471233, 0.0463745269128601, 0.04570528877963544, 0.04502825357414316, 0.04434388973870187, 0.04365275912392017, 0.04295549604184982, 0.042252798142970184, 0.04154541329987137, 0.04083413137793851, 0.04011977525461593, 0.03940319223262282, 0.03868524422772766, 0.0379668081047238, 0.03724876636321977, 0.03653199881686657, 0.03581738128669636, 0.03510578454594757, 0.03439797437262052, 0.033694687478536345, 0.03299659763777444, 0.03230429791565826, 0.03161829107545403, 0.030938980487561087, 0.030266669709016174, 0.0296015525711155, 0.02894371271251552, 0.028293123025453196, 0.027649725899297518, 0.027013423216832075, 0.026384087546918248, 0.02576156326226085, 0.02514567754031812, 0.02453624304638488, 0.02393305838926273, 0.0233359184933291, 0.02274461575367458, 0.022158939722531385, 0.02157868120272202, 0.02100363717518596, 0.02043361608941151, 0.019868431676811936, 0.01930790376896397, 0.018751859325187165, 0.01820013350176841, 0.017652569848970125, 0.01710901976065249, 0.01656934272088167, 0.016033406125137045, 0.015501084445324562, 0.014972259322241517, 0.014446819572656474, 0.013924659999226477, 0.013405682491897156, 0.01288979598190656, 0.012376917208632812, 0.011866970306957354, 0.01135988594221281, 0.010855599810964911, 0.010354053260509745, 0.00985519282898657, 0.00935896907231553, 0.008865337539050223, 0.008374258912636307, 0.007885700004458484, 0.007399632791834368, 0.006916033540895829, 0.00643488272511303, 0.005956166004883689, 0.005479874056022348, 0.00500600270732644, 0.004534552885550587, 0.004065532596376765, 0.0035989558455154455, 0.003134842633234769, 0.002673218991892843, 0.0022141189800466433, 0.001757583690265983, 0.0013036621881308204], "type": "scatter", "mode": "lines", "name": "Normal PDF (mu=0, sigma=2)", "line": {"color": "#339af0"}}], "layout": {"title": {"text": "Normal Distribution PDF (μ=0,σ=2)"}, "xaxis": {"title": {"text": "x"}}, "yaxis": {"title": {"text": "Probability Density"}}, "width": 600, "height": 400, "template": "plotly_white"}}Probability Density Function (PDF) of a Normal distribution with mean 0 and standard deviation 2.Working with Discrete Distributions: The Binomial DistributionThe Binomial distribution models the number of successes $k$ in a fixed number $n$ of independent Bernoulli trials, each with a probability of success $p$. In scipy.stats, it's represented by binom. We need to specify $n$ and $p$.from scipy.stats import binom # Define a Binomial distribution: n=10 trials, p=0.5 probability of success n_trials = 10 prob_success = 0.5 my_binomial = binom(n=n_trials, p=prob_success) # Calculate the PMF for k=5 successes (P(X=5)) pmf_at_5 = my_binomial.pmf(5) print(f"PMF at k=5 (P(X=5)): {pmf_at_5:.4f}") # Calculate the CDF at k=5 (P(X <= 5)) cdf_at_5 = my_binomial.cdf(5) print(f"CDF at k=5 (P(X <= 5)): {cdf_at_5:.4f}") # Calculate the PPF for probability 0.9 (find k such that P(X <= k) >= 0.9) # Note: For discrete distributions, PPF gives the smallest k satisfying the condition. quantile_90 = my_binomial.ppf(0.9) print(f"Value k such that P(X <= k) >= 0.9: {quantile_90}") # Generate 10 random samples (number of successes in 10 trials) random_samples_binom = my_binomial.rvs(size=10) print(f"Ten random samples (number of successes): {random_samples_binom}") # Generate data for plotting the PMF k_values = np.arange(0, n_trials + 1) pmf_values = my_binomial.pmf(k_values){"data": [{"x": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], "y": [0.0009765625, 0.009765625, 0.0439453125, 0.1171875, 0.205078125, 0.24609375, 0.205078125, 0.1171875, 0.0439453125, 0.009765625, 0.0009765625], "type": "bar", "name": "Binomial PMF (n=10, p=0.5)", "marker": {"color": "#20c997"}}], "layout": {"title": {"text": "Binomial Distribution PMF ($n=10, p=0.5$)"}, "xaxis": {"title": {"text": "Number of Successes (k)"}}, "yaxis": {"title": {"text": "Probability Mass"}}, "bargap": 0.2, "width": 600, "height": 400, "template": "plotly_white"}}Probability Mass Function (PMF) of a Binomial distribution with $n=10$ trials and success probability $p=0.5$.Other DistributionsThe pattern shown above applies similarly to other distributions available in scipy.stats:Poisson: Use poisson(mu) where mu is the rate parameter $\lambda$. Methods include .pmf(), .cdf(), .ppf(), .rvs().Exponential: Use expon(scale=1/lambda) where scale corresponds to $1/\lambda$, the inverse of the rate parameter $\lambda$. Alternatively, you can use loc to shift the distribution. Methods include .pdf(), .cdf(), .ppf(), .rvs().Uniform: Use uniform(loc=a, scale=b-a) for a uniform distribution over the interval $[a, b)$. The loc parameter defines the start point $a$, and scale defines the width $b-a$. Methods include .pdf(), .cdf(), .ppf(), .rvs().For instance, to find the probability of observing exactly 3 events for a Poisson distribution with an average rate ($\lambda$) of 4 events per interval:from scipy.stats import poisson lambda_rate = 4 my_poisson = poisson(mu=lambda_rate) pmf_at_3 = my_poisson.pmf(3) print(f"Poisson PMF at k=3 (lambda=4): {pmf_at_3:.4f}")Being able to compute probabilities and generate samples from these standard distributions using SciPy is a fundamental skill. It allows you to simulate processes, test hypotheses (as we'll see later), and build components for more complex machine learning models that rely on probabilistic assumptions. Familiarity with the scipy.stats interface for these common distributions will prove highly beneficial.