In the previous chapter, we looked at calculating probabilities for specific events. Now, we'll explore how to describe the probabilities of all possible outcomes of a random process. This organized description is what we call a probability distribution. It's a fundamental concept for understanding variability and is used extensively in machine learning for modeling data and uncertainty.
Before diving into distributions, let's clarify what they describe: random variables. A random variable is essentially a variable whose value is a numerical outcome determined by chance. Think of it as a way to map the outcomes of a random experiment (like flipping a coin or measuring someone's height) to numbers.
A probability distribution then specifies the probability for each possible value that the random variable can take. It's a mathematical function that provides the probabilities of occurrence of different possible outcomes in an experiment. Imagine you have a total probability of 1 (or 100%) to distribute among all possible outcomes; the probability distribution tells you exactly how that probability is allocated.
Probability distributions generally fall into two main categories, based on the type of random variable they describe:
These distributions describe random variables that can only take on a finite number of specific, separate values, often integers. You can count the possible outcomes.
For discrete distributions, we can list each possible value the random variable can take and assign a probability to each one. The sum of all these probabilities must equal 1. We often use a Probability Mass Function (PMF) to define this relationship, which we'll discuss in the next section.
Consider the simple example of rolling a fair six-sided die. The random variable Y is the outcome. There are six possible, equally likely outcomes. The probability distribution can be visualized as follows:
Each possible outcome (1 through 6) has an equal probability of 1/6≈0.167. This is an example of a discrete uniform distribution.
These distributions describe random variables that can take on any value within a given range or interval. You can't simply list all possible values because there are infinitely many.
For continuous distributions, the probability of the random variable taking on any single, exact value is actually zero (think about the chance of someone being exactly 175.0000... cm tall). Instead, we talk about the probability of the variable falling within a specific interval. For instance, what's the probability that someone's height is between 170 cm and 180 cm? These distributions are described using a Probability Density Function (PDF), which we will cover later in this chapter. The PDF helps determine the likelihood of the variable falling within a range of values; the area under the curve of the PDF over an interval corresponds to the probability of the variable being in that interval.
Imagine the distribution of adult heights. It often follows a bell shape, where heights near the average are more likely, and very short or very tall heights are less likely. We can't assign a probability to exactly 175cm, but we can find the probability of being between, say, 174cm and 176cm by looking at the area under the curve in that range.
Understanding probability distributions is significant for several reasons in data analysis and machine learning:
In the following sections, we will look more closely at the functions used to define discrete (PMF) and continuous (PDF) distributions, and examine some of the most common distributions encountered in practice.
© 2025 ApX Machine Learning