Now that we understand random variables map outcomes from a sample space to numerical values, we often want to summarize the characteristics of these variables. Just knowing the possible values isn't enough; we need ways to describe the "center" and "spread" of the distribution of these values. Expected value and variance are the fundamental measures that provide this summary.
The expected value, denoted as E[X] or sometimes μX (or simply μ), represents the weighted average of the possible values a random variable X can take, where the weights are the probabilities of those values. Intuitively, if you were to repeat an experiment involving X many times and calculate the average of the outcomes, that average would converge to the expected value E[X]. It's like the "center of mass" of the probability distribution.
The calculation differs slightly for discrete and continuous random variables:
For a Discrete Random Variable X: If X can take values x1,x2,...,xn with corresponding probabilities P(X=x1),P(X=x2),...,P(X=xn), the expected value is: E[X]=∑ixiP(X=xi) The sum is taken over all possible values xi.
For a Continuous Random Variable X: If X has a probability density function (PDF) f(x), the expected value is calculated by integrating the product of x and f(x) over the entire range of X: E[X]=∫−∞∞xf(x)dx
Example: Fair Six-Sided Die
Let X be the random variable representing the outcome of rolling a fair six-sided die. The possible values are {1,2,3,4,5,6}, and each has a probability of 1/6.
The expected value is: E[X]=1⋅61+2⋅61+3⋅61+4⋅61+5⋅61+6⋅61 E[X]=61+2+3+4+5+6=621=3.5 Notice that the expected value (3.5) is not a value the die can actually land on. It's the long-term average outcome over many rolls.
Expected value has some useful linear properties:
These properties are extremely useful for simplifying calculations involving combinations of random variables.
While expected value tells us about the center of a distribution, it doesn't tell us how spread out the values are. Are the values tightly clustered around the mean, or are they widely dispersed? Variance measures this spread.
The variance of a random variable X, denoted as Var(X) or σX2 (or simply σ2), is the expected value of the squared difference between the random variable and its expected value μ=E[X].
Var(X)=E[(X−μ)2]
A higher variance means the values of X tend to be further away from the mean, on average. A lower variance means they tend to be closer to the mean.
Similar to expected value, the calculation depends on whether the variable is discrete or continuous:
For a Discrete Random Variable X: Using the definition μ=E[X]: Var(X)=∑i(xi−μ)2P(X=xi)
For a Continuous Random Variable X: Using the definition μ=E[X] and PDF f(x): Var(X)=∫−∞∞(x−μ)2f(x)dx
There's often a more convenient computational formula derived from the definition: Var(X)=E[X2]−(E[X])2 To use this, you first calculate E[X] (the mean) and E[X2] (the expected value of X squared), then plug them into the formula. Remember, E[X2]=∑ixi2P(X=xi) for discrete variables and E[X2]=∫−∞∞x2f(x)dx for continuous variables.
The variance is measured in squared units of the original random variable (e.g., if X is in meters, Var(X) is in meters squared). This can be hard to interpret directly. The standard deviation, denoted as σX or SD(X) (or simply σ), is the positive square root of the variance:
σ=Var(X)
The standard deviation is measured in the same units as the original random variable X, making it more intuitive to understand the typical deviation from the mean.
Example: Fair Six-Sided Die (Continued)
We found E[X]=3.5. Let's calculate the variance using the definition:
Var(X)====(1−3.5)261+(2−3.5)261+(3−3.5)261+(4−3.5)261+(5−3.5)261+(6−3.5)26161[(−2.5)2+(−1.5)2+(−0.5)2+(0.5)2+(1.5)2+(2.5)2]61[6.25+2.25+0.25+0.25+2.25+6.25]61[17.5]≈2.917Alternatively, using the computational formula Var(X)=E[X2]−(E[X])2: First, calculate E[X2]:
E[X2]=1261+2261+3261+4261+5261+6261=61+4+9+16+25+36=691≈15.167Now, calculate the variance: Var(X)=E[X2]−(E[X])2=691−(3.5)2=691−12.25=691−673.5=617.5≈2.917 Both methods yield the same result.
The standard deviation is: σ=Var(X)=617.5≈2.917≈1.708 So, for a fair die roll, the expected outcome is 3.5, and the outcomes typically deviate from this mean by about 1.708.
Variance also has important properties:
Understanding expected value and variance is fundamental. They provide concise summaries of a probability distribution's central tendency and dispersion, forming the basis for many concepts in statistics and machine learning, from evaluating estimators to understanding uncertainty in predictions. In later sections, we'll see how Python libraries like NumPy and SciPy make calculating these values straightforward for various distributions.
© 2025 ApX Machine Learning