After finding the center of our data using measures like the mean or median, the next logical question is: how spread out are the data points? Do they cluster tightly around the center, or are they widely scattered? Measures of dispersion (also called spread or variability) help us answer this. The simplest measure of dispersion is the range.
The range is the difference between the highest value (maximum) and the lowest value (minimum) in a dataset. It gives you a quick sense of the total span covered by your data.
Calculating the range is straightforward:
The formula is:
Range=Maximum Value−Minimum ValueImagine we have the following daily high temperatures (in Celsius) recorded over a week: [21, 25, 19, 28, 22, 26, 20]
.
A larger range indicates greater spread or variability in the data, meaning the values are scattered over a wider interval. A smaller range suggests the data points are closer together, indicating less variability.
While the range is easy to calculate and understand, it has a significant limitation: it only considers the two most extreme values in the dataset. This makes it very sensitive to outliers (unusually high or low values).
Consider our temperature data again. What if one day was unusually hot, giving us temperatures: [21, 25, 19, 45, 22, 26, 20]
.
Now the range is 26°C, which is much larger than the previous 9°C. The single outlier (45°C) dramatically increased the range, potentially giving a misleading impression of the typical day-to-day temperature variation. The bulk of the data might still be clustered together, but the range doesn't reflect that.
In Python, you can easily calculate the range using libraries like NumPy or Pandas.
import numpy as np
temperatures = np.array([21, 25, 19, 28, 22, 26, 20])
# Calculate range using NumPy functions
data_range = np.ptp(temperatures) # ptp stands for "peak to peak"
# Alternatively, calculate manually
max_temp = np.max(temperatures)
min_temp = np.min(temperatures)
manual_range = max_temp - min_temp
print(f"Temperatures: {temperatures}")
print(f"Maximum Temperature: {max_temp}")
print(f"Minimum Temperature: {min_temp}")
print(f"Range (using np.ptp): {data_range}")
print(f"Range (manual calculation): {manual_range}")
# Example with outlier
temperatures_with_outlier = np.array([21, 25, 19, 45, 22, 26, 20])
range_with_outlier = np.ptp(temperatures_with_outlier)
print(f"\nTemperatures with outlier: {temperatures_with_outlier}")
print(f"Range with outlier: {range_with_outlier}")
Because of its sensitivity to outliers, the range is often used as a quick preliminary check rather than the sole measure of spread. In the following sections, we'll look at more robust measures like variance and standard deviation, which consider all data points and are less affected by extreme values.
© 2025 ApX Machine Learning