The SciPy library offers powerful tools for practically working with probability distributions in Python. Specifically, its stats module (scipy.stats) provides a comprehensive set of functions for interacting with a wide array of these distributions. This capability is essential for statistical modeling, simulation, and various machine learning tasks.
The scipy.stats module offers a consistent interface for many distributions, both continuous and discrete. For each distribution, you can typically perform several operations:
.pdf() method..pmf() method..cdf() method..ppf() method..rvs() method.Let's see how this works with some examples.
The Normal (Gaussian) distribution is ubiquitous in statistics. In scipy.stats, it's represented by norm. To work with a specific Normal distribution, we often need to specify its mean (μ) using the loc parameter and its standard deviation (σ) using the scale parameter. Remember, the Normal distribution is often parameterized by variance σ2, but SciPy uses standard deviation for scale.
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt # We'll use this for basic plotting setup
# Define a Normal distribution: mean=0, std_dev=2
mu = 0
sigma = 2
my_normal = norm(loc=mu, scale=sigma)
# Calculate the PDF at x=1
pdf_at_1 = my_normal.pdf(1)
print(f"PDF at x=1: {pdf_at_1:.4f}")
# Calculate the CDF at x=1 (P(X <= 1))
cdf_at_1 = my_normal.cdf(1)
print(f"CDF at x=1 (P(X <= 1)): {cdf_at_1:.4f}")
# Calculate the PPF for probability 0.95 (find the 95th percentile)
percentile_95 = my_normal.ppf(0.95)
print(f"95th Percentile (Value x such that P(X <= x) = 0.95): {percentile_95:.4f}")
# Generate 5 random samples from this distribution
random_samples = my_normal.rvs(size=5)
print(f"Five random samples: {random_samples}")
# Generate data points for plotting the PDF
x_values = np.linspace(mu - 4*sigma, mu + 4*sigma, 200) # Cover range around mean
pdf_values = my_normal.pdf(x_values)
Probability Density Function (PDF) of a Normal distribution with mean 0 and standard deviation 2.
The Binomial distribution models the number of successes k in a fixed number n of independent Bernoulli trials, each with a probability of success p. In scipy.stats, it's represented by binom. We need to specify n and p.
from scipy.stats import binom
# Define a Binomial distribution: n=10 trials, p=0.5 probability of success
n_trials = 10
prob_success = 0.5
my_binomial = binom(n=n_trials, p=prob_success)
# Calculate the PMF for k=5 successes (P(X=5))
pmf_at_5 = my_binomial.pmf(5)
print(f"PMF at k=5 (P(X=5)): {pmf_at_5:.4f}")
# Calculate the CDF at k=5 (P(X <= 5))
cdf_at_5 = my_binomial.cdf(5)
print(f"CDF at k=5 (P(X <= 5)): {cdf_at_5:.4f}")
# Calculate the PPF for probability 0.9 (find k such that P(X <= k) >= 0.9)
# Note: For discrete distributions, PPF gives the smallest k satisfying the condition.
quantile_90 = my_binomial.ppf(0.9)
print(f"Value k such that P(X <= k) >= 0.9: {quantile_90}")
# Generate 10 random samples (number of successes in 10 trials)
random_samples_binom = my_binomial.rvs(size=10)
print(f"Ten random samples (number of successes): {random_samples_binom}")
# Generate data for plotting the PMF
k_values = np.arange(0, n_trials + 1)
pmf_values = my_binomial.pmf(k_values)
Probability Mass Function (PMF) of a Binomial distribution with n=10 trials and success probability p=0.5.
The pattern shown above applies similarly to other distributions available in scipy.stats:
poisson(mu) where mu is the rate parameter λ. Methods include .pmf(), .cdf(), .ppf(), .rvs().expon(scale=1/lambda) where scale corresponds to 1/λ, the inverse of the rate parameter λ. Alternatively, you can use loc to shift the distribution. Methods include .pdf(), .cdf(), .ppf(), .rvs().uniform(loc=a, scale=b-a) for a uniform distribution over the interval [a,b). The loc parameter defines the start point a, and scale defines the width b−a. Methods include .pdf(), .cdf(), .ppf(), .rvs().For instance, to find the probability of observing exactly 3 events for a Poisson distribution with an average rate (λ) of 4 events per interval:
from scipy.stats import poisson
lambda_rate = 4
my_poisson = poisson(mu=lambda_rate)
pmf_at_3 = my_poisson.pmf(3)
print(f"Poisson PMF at k=3 (lambda=4): {pmf_at_3:.4f}")
Being able to compute probabilities and generate samples from these standard distributions using SciPy is a fundamental skill. It allows you to simulate processes, test hypotheses (as we'll see later), and build components for more complex machine learning models that rely on probabilistic assumptions. Familiarity with the scipy.stats interface for these common distributions will prove highly beneficial.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with