Now that we've discussed the fundamental concepts of descriptive statistics, let's put them into practice. This hands-on exercise will guide you through calculating the mean, median, mode, range, variance, and standard deviation for a small dataset. Performing these calculations manually, at least once, helps solidify your understanding of what these values represent before relying on software tools.
Imagine a small class took a short quiz, and their scores (out of 100) are as follows:
[85, 90, 78, 92, 85, 88, 76, 95, 85, 90]
This list represents our dataset. Let's analyze these scores using the statistics we've learned.
These statistics help us understand the "center" or typical value of the data.
The mean is the sum of all values divided by the number of values. The formula is: xˉ=n∑i=1nxi Where xi represents each score, and n is the total number of scores.
The average score for this quiz is 86.4.
The median is the middle value when the data is sorted. If there's an even number of data points, it's the average of the two middle values.
[76, 78, 85, 85, 85, 88, 90, 90, 92, 95]
The median score is 86.5. Half the students scored below 86.5, and half scored above.
The mode is the value that appears most often in the dataset.
[76, 78, 85, 85, 85, 88, 90, 90, 92, 95]
The mode score is 85.
These statistics tell us how spread out or dispersed the data points are.
The range is the difference between the highest and lowest values.
The scores span a range of 19 points.
Variance measures the average squared difference of each score from the mean. We use the sample variance formula (dividing by n−1) because our scores represent a sample of potential student performance.
The formula is: s2=n−1∑i=1n(xi−xˉ)2
Let's break this down:
The sample variance is approximately 35.38. This value is in "squared points," which isn't very intuitive.
Standard deviation is the square root of the variance. It gives us a measure of spread in the original units (quiz points).
The formula is: s=s2
The sample standard deviation is approximately 5.95 points. This suggests that, on average, scores tend to deviate from the mean score of 86.4 by about 5.95 points.
For our dataset [85, 90, 78, 92, 85, 88, 76, 95, 85, 90]
:
The mean and median are very close, suggesting the distribution of scores is relatively symmetric around the center. The mode is slightly lower. The standard deviation gives us a sense of the typical spread around the average score.
We can also look at the frequency of scores within certain ranges (bins). Let's group scores into bins of width 5:
This frequency distribution can be visualized using a histogram:
Histogram showing the frequency of student scores within 5-point intervals. The tallest bar corresponds to the 85-89 range, reflecting the mode (85) falling within this bin.
While manual calculation is useful for learning, in practice, you'll use software tools. Spreadsheet programs like Google Sheets or Microsoft Excel have functions like AVERAGE()
, MEDIAN()
, MODE.SNGL()
, MAX()
, MIN()
, VAR.S()
, and STDEV.S()
. Programming languages like Python, with libraries such as NumPy or Pandas, provide similar functions (e.g., mean()
, median()
, mode()
, var()
, std()
) that make these calculations effortless, especially for large datasets.
This practical exercise demonstrated how to compute basic descriptive statistics. These numbers provide a first important summary of your data's characteristics, forming a foundation for more detailed analysis.
© 2025 ApX Machine Learning