Understanding probability distributions is one thing, but seeing them in action by generating data that follows these distributions can significantly aid comprehension. Modern scientific computing libraries in Python, particularly SciPy and NumPy, provide powerful tools to generate random numbers (samples) from a wide variety of probability distributions. This process is often called sampling.

Sampling is useful for many tasks, including:

Simulating random processes.
Understanding the shape and behavior of distributions visually.
Testing statistical methods or machine learning algorithms.
Generating synthetic data.

We will primarily use the scipy.stats module, which offers a consistent interface for working with distributions, including generating random variates (samples) using the .rvs() method. We'll also occasionally mention equivalent functions in numpy.random.

Let's start by importing the necessary libraries. We'll need scipy.stats for the distributions and matplotlib.pyplot (often imported as plt) for basic visualization, though we'll render charts using Plotly format for interactive web display. We'll also use numpy for numerical operations.

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt # We use this, but output Plotly JSON

# Configure visualizations (optional, helps make plots nicer with matplotlib)
# plt.style.use('seaborn-v0_8-whitegrid')

Generating Samples from Discrete Distributions

Discrete distributions deal with countable outcomes. We'll look at the Bernoulli and Binomial distributions.

Bernoulli Distribution

The Bernoulli distribution models a single trial with two possible outcomes: success (usually coded as 1) with probability $p$ , and failure (usually coded as 0) with probability $1-p$ . Think of a single coin flip.

To generate samples from a Bernoulli distribution using scipy.stats, we use stats.bernoulli.rvs(). The main parameter is $p$ , the probability of success.

# Parameters
prob_success = 0.7  # Probability of success (e.g., heads)
num_samples = 10    # Number of trials (samples) to generate

# Generate samples
# Each sample is either 0 or 1
bernoulli_samples = stats.bernoulli.rvs(p=prob_success, size=num_samples)

print(f"Bernoulli Samples (p={prob_success}): {bernoulli_samples}")

Running this might produce output like: Bernoulli Samples (p=0.7): [1 1 0 1 1 1 0 1 0 1]. Each number represents the outcome of one trial. If you generate many samples, you'd expect the proportion of 1s to be close to $p$ .

(Equivalent NumPy function: np.random.binomial(1, p, size=num_samples))

Binomial Distribution

The Binomial distribution models the number of successes in a fixed number, $n$ , of independent Bernoulli trials, each with the same probability of success $p$ . For example, counting the number of heads in 10 coin flips.

We use stats.binom.rvs(), specifying $n$ (number of trials) and $p$ (probability of success per trial). The size parameter indicates how many times we want to run this experiment (i.e., how many samples of the count of successes we want).

# Parameters
num_trials = 10     # Number of Bernoulli trials in one experiment (n)
prob_success = 0.5  # Probability of success in each trial (p)
num_experiments = 1000 # Number of times we run the experiment (generate samples)

# Generate samples
# Each sample is the count of successes in 'n' trials
binomial_samples = stats.binom.rvs(n=num_trials, p=prob_success, size=num_experiments)

print(f"First 10 Binomial Samples (n={num_trials}, p={prob_success}): {binomial_samples[:10]}")
# Example Output: First 10 Binomial Samples (n=10, p=0.5): [5 6 5 4 7 5 6 5 5 3]

Each number in the output represents the total number of successes obtained in one set of 10 trials. To visualize the distribution of these counts, we can create a histogram.

# Visualization (code using Matplotlib)
# plt.figure(figsize=(8, 4))
# plt.hist(binomial_samples, bins=np.arange(num_trials + 2) - 0.5, density=True, alpha=0.7, color='#15aabf', edgecolor='black')
# plt.title(f'Binomial Distribution Samples (n={num_trials}, p={prob_success})')
# plt.xlabel('Number of Successes')
# plt.ylabel('Probability Density')
# plt.xticks(range(num_trials + 1))
# plt.grid(axis='y')
# plt.show()

# Actual Plotly JSON output for the histogram
hist_counts, bin_edges = np.histogram(binomial_samples, bins=np.arange(num_trials + 2) - 0.5, density=True)
bin_centers = 0.5 * (bin_edges[:-1] + bin_edges[1:])

{"data": [{"type": "bar", "x": [str(int(x)) for x in bin_centers], "y": hist_counts.tolist(), "name": "Sample Frequency", "marker": {"color": "#15aabf", "line": {"color": "#495057", "width": 1}}}], "layout": {"title": {"text": "Simulated Binomial Distribution (n=10, p=0.5)"}, "xaxis": {"title": {"text": "Number of Successes"}}, "yaxis": {"title": {"text": "Estimated Probability"}}, "bargap": 0.1, "width": 600, "height": 400}}

A histogram of 1000 samples drawn from a Binomial distribution with $n=10$ trials and success probability $p=0.5$ . The shape approximates the theoretical Binomial PMF, centered around $n \times p = 5$ .

Generating Samples from Continuous Distributions

Continuous distributions describe outcomes over a continuous range.

Uniform Distribution

The Uniform distribution assigns equal probability density to all outcomes within a specified range $[a, b)$ . Outcomes outside this range have zero probability.

We use stats.uniform.rvs(). It takes loc (the starting point, $a$ ) and scale (the width of the range, $b-a$ ) as parameters.

# Parameters
lower_bound = 5.0   # Start of the interval (a)
upper_bound = 10.0  # End of the interval (b)
num_samples = 1000

# Calculate loc and scale
loc_param = lower_bound
scale_param = upper_bound - lower_bound

# Generate samples
uniform_samples = stats.uniform.rvs(loc=loc_param, scale=scale_param, size=num_samples)

print(f"First 10 Uniform Samples (range=[{lower_bound}, {upper_bound})): {uniform_samples[:10]}")
# Example Output: First 10 Uniform Samples (range=[5.0, 10.0)): [7.82 9.21 5.34 6.78 8.89 5.01 9.98 7.11 6.05 8.43]

(Equivalent NumPy function: np.random.uniform(low=lower_bound, high=upper_bound, size=num_samples))

A histogram of these samples should appear roughly flat across the interval $[5, 10)$ .

# Visualization (code using Matplotlib)
# plt.figure(figsize=(8, 4))
# plt.hist(uniform_samples, bins=20, density=True, alpha=0.7, color='#fd7e14', edgecolor='black')
# plt.title(f'Uniform Distribution Samples (range=[{lower_bound}, {upper_bound}))')
# plt.xlabel('Value')
# plt.ylabel('Probability Density')
# plt.grid(axis='y')
# plt.show()

# Actual Plotly JSON output for the histogram
hist_counts, bin_edges = np.histogram(uniform_samples, bins=20, density=True)
bin_centers = 0.5 * (bin_edges[:-1] + bin_edges[1:])

{"data": [{"type": "bar", "x": bin_centers.tolist(), "y": hist_counts.tolist(), "name": "Sample Density", "marker": {"color": "#fd7e14", "line": {"color": "#495057", "width": 1}}}], "layout": {"title": {"text": "Simulated Uniform Distribution (Range=[5, 10))"}, "xaxis": {"title": {"text": "Value"}}, "yaxis": {"title": {"text": "Estimated Density"}}, "bargap": 0.05, "width": 600, "height": 400}}

A histogram of 1000 samples drawn from a Uniform distribution over the interval $[5, 10)$ . The density is approximately constant within this range.

Normal (Gaussian) Distribution

The Normal distribution, often called the bell curve, is perhaps the most common continuous distribution. It's characterized by its mean ( $\mu$ , loc) and standard deviation ( $\sigma$ , scale). The distribution is symmetric around the mean.

We use stats.norm.rvs() with loc for the mean and scale for the standard deviation.

# Parameters
mean_val = 0.0      # Mean (mu)
std_dev = 1.0       # Standard Deviation (sigma)
num_samples = 1000

# Generate samples
normal_samples = stats.norm.rvs(loc=mean_val, scale=std_dev, size=num_samples)

print(f"First 10 Normal Samples (mean={mean_val}, std_dev={std_dev}): {normal_samples[:10]}")
# Example Output: First 10 Normal Samples (mean=0.0, std_dev=1.0): [-0.54  1.25  0.21 -1.87  0.88 -0.76  0.33 -0.11 -0.45  1.05]

(Equivalent NumPy function: np.random.normal(loc=mean_val, scale=std_dev, size=num_samples))

A histogram of normal samples will show the characteristic bell shape, centered at the mean.

# Visualization (code using Matplotlib)
# plt.figure(figsize=(8, 4))
# plt.hist(normal_samples, bins=30, density=True, alpha=0.7, color='#4263eb', edgecolor='black')
# plt.title(f'Normal Distribution Samples (mean={mean_val}, std_dev={std_dev})')
# plt.xlabel('Value')
# plt.ylabel('Probability Density')
# plt.grid(axis='y')
# plt.show()

# Actual Plotly JSON output for the histogram
hist_counts, bin_edges = np.histogram(normal_samples, bins=30, density=True)
bin_centers = 0.5 * (bin_edges[:-1] + bin_edges[1:])

{"data": [{"type": "bar", "x": bin_centers.tolist(), "y": hist_counts.tolist(), "name": "Sample Density", "marker": {"color": "#4263eb", "line": {"color": "#495057", "width": 1}}}], "layout": {"title": {"text": "Simulated Normal Distribution (Mean=0, StdDev=1)"}, "xaxis": {"title": {"text": "Value"}}, "yaxis": {"title": {"text": "Estimated Density"}}, "bargap": 0.05, "width": 600, "height": 400}}

A histogram of 1000 samples drawn from a Standard Normal distribution ( $\mu=0, \sigma=1$ ). The distribution clearly shows the characteristic bell shape centered at 0.

Being able to generate samples from these fundamental distributions is a practical skill. It allows you to simulate data that mirrors real-world phenomena characterized by these patterns, providing a basis for experiments, testing hypotheses, and understanding the inputs or outputs of machine learning models that rely on probabilistic assumptions. As you encounter more complex distributions, the process of sampling using libraries like SciPy remains similar.