While knowing the center of your data (mean, median, mode) is informative, it doesn't tell the whole story. Imagine two cities: City A has daily temperatures over a week of {18, 19, 20, 20, 21, 22, 20} degrees Celsius, and City B has {10, 30, 5, 15, 35, 25, 0} degrees Celsius. If you calculate the mean temperature for both, you'll find it's 20°C for both cities. However, the weather experience in these two cities is vastly different! City A is very consistent, while City B has wild temperature swings.
This is where measures of spread, also known as measures of dispersion or variability, become essential. They quantify how much the data points in your dataset tend to deviate from the average value. Let's look at the most common ways to measure this spread.
The simplest measure of spread is the range. It's calculated by subtracting the minimum value in the dataset from the maximum value.
Range=Maximum Value−Minimum ValueFor City A: Range = 22°C - 18°C = 4°C For City B: Range = 35°C - 0°C = 35°C
The range gives you a quick sense of the total span covered by your data. As you can see, City B has a much larger range than City A, reflecting its greater temperature variability.
While easy to calculate, the range has a significant drawback: it only considers the two most extreme values. A single very high or very low value (an outlier) can drastically affect the range, potentially giving a misleading picture of the overall data spread.
To get a more robust measure of spread that considers all data points, we use variance. Variance measures the average of the squared differences between each data point and the mean of the dataset. Squaring the differences serves two purposes:
Conceptually, for a dataset x1,x2,...,xn with a mean xˉ, the variance involves these steps:
There are slightly different formulas for population variance (denoted σ2, pronounced "sigma squared") and sample variance (denoted s2). For introductory purposes, the concept is the main focus. A common formula for sample variance is:
s2=n−1∑i=1n(xi−xˉ)2The division by n−1 instead of n is a technical adjustment used when estimating the population variance from a sample, providing a better estimate.
Let's calculate the variance for City A (Mean xˉ=20): Differences: (18-20), (19-20), (20-20), (20-20), (21-20), (22-20), (20-20) = {-2, -1, 0, 0, 1, 2, 0} Squared Differences: {4, 1, 0, 0, 1, 4, 0} Sum of Squared Differences: 4 + 1 + 0 + 0 + 1 + 4 + 0 = 10 Sample Variance s2=10/(7−1)=10/6≈1.67
Now for City B (Mean xˉ=20): Differences: (10-20), (30-20), (5-20), (15-20), (35-20), (25-20), (0-20) = {-10, 10, -15, -5, 15, 5, -20} Squared Differences: {100, 100, 225, 25, 225, 25, 400} Sum of Squared Differences: 100 + 100 + 225 + 25 + 225 + 25 + 400 = 1100 Sample Variance s2=1100/(7−1)=1100/6≈183.33
As expected, the variance for City B (183.33) is much higher than for City A (1.67), indicating greater spread.
A limitation of variance is its units. If our original data was in degrees Celsius (°C), the variance is in degrees Celsius squared (°C²). This isn't very intuitive to interpret directly in relation to the original data.
This leads us to the most commonly used measure of spread: the standard deviation. It's simply the square root of the variance.
Standard Deviation=VarianceFor population standard deviation, it's denoted by σ (sigma), and for sample standard deviation, it's s.
s=s2=n−1∑i=1n(xi−xˉ)2The primary advantage of the standard deviation is that it brings the measure of spread back into the original units of the data.
For City A: Standard Deviation s=1.67≈1.29 °C For City B: Standard Deviation s=183.33≈13.54 °C
Now we can say something more intuitive:
This clearly shows the much larger variability in City B's temperatures, expressed in understandable units.
A smaller standard deviation indicates that data points tend to be close to the mean (low variability, high consistency). A larger standard deviation indicates that data points are spread out over a wider range of values (high variability, low consistency).
This histogram compares the temperature distributions for City A and City B. Both have a mean of 20°C, but the spread of temperatures (indicated by the width of the distribution and the standard deviation, SD) is much larger for City B (orange) than for City A (blue).
Understanding these measures of spread - Range, Variance, and especially Standard Deviation - is fundamental for describing datasets beyond just their central tendency. They provide insights into the consistency and variability within your data, which is often just as important as the average value.
© 2025 ApX Machine Learning