Descriptive statistics offer a straightforward yet powerful approach to summarizing and characterizing the key features of a dataset. They provide insights into the nature of your data, laying the groundwork for more advanced analyses. In this section, we will explore essential descriptive statistics measures such as mean, median, mode, variance, and standard deviation, and demonstrate how to compute them using Numpy and Pandas.
Before diving into the code, let's briefly define some of the key statistical measures:
Numpy is a powerful library for numerical computations in Python. It offers simple functions to efficiently calculate these statistical measures.
Here's how you can calculate these statistics using Numpy:
import numpy as np
data = np.array([10, 15, 14, 10, 18, 20, 25, 30])
# Calculate Mean
mean = np.mean(data)
print(f"Mean: {mean}")
# Calculate Median
median = np.median(data)
print(f"Median: {median}")
# Calculate Mode - Numpy does not directly support mode
# You can use scipy for mode or implement a workaround
from scipy import stats
mode = stats.mode(data)
print(f"Mode: {mode.mode[0]}")
# Calculate Variance
variance = np.var(data)
print(f"Variance: {variance}")
# Calculate Standard Deviation
std_dev = np.std(data)
print(f"Standard Deviation: {std_dev}")
Pandas builds on Numpy and provides even higher-level, more intuitive operations for data manipulation. It is especially useful for working with labeled data.
Consider a simple Pandas DataFrame:
import pandas as pd
data = {'Scores': [10, 15, 14, 10, 18, 20, 25, 30]}
df = pd.DataFrame(data)
# Calculate Mean
mean = df['Scores'].mean()
print(f"Mean: {mean}")
# Calculate Median
median = df['Scores'].median()
print(f"Median: {median}")
# Calculate Mode
mode = df['Scores'].mode()
print(f"Mode: {mode[0]}")
# Calculate Variance
variance = df['Scores'].var()
print(f"Variance: {variance}")
# Calculate Standard Deviation
std_dev = df['Scores'].std()
print(f"Standard Deviation: {std_dev}")
Using Numpy and Pandas, you can effortlessly compute descriptive statistics to better understand your dataset. These measures form the foundation of exploratory data analysis, allowing you to identify trends and patterns. As we progress through this chapter, remember that these statistics are just the beginning. They will help you interpret data meaningfully and prepare it for further, more complex analyses and visualizations. By mastering these basics, you are well on your way to becoming proficient in data analysis and visualization using Python.
© 2025 ApX Machine Learning