While understanding the formulas for mean, variance, correlation, and other descriptive statistics is fundamental, calculating them manually for anything larger than a toy dataset quickly becomes impractical. This is where Python's data analysis libraries, particularly Pandas, become indispensable tools for data scientists and machine learning practitioners. Pandas provides efficient and easy-to-use functions to compute a wide array of descriptive statistics on your data, typically stored in Series or DataFrame objects.Let's assume you have your data loaded into a Pandas DataFrame. If you're following along, you can create a sample DataFrame like this:import pandas as pd import numpy as np # Create sample data data = {'ExamScore': [78, 85, 92, 65, 72, 88, 95, 81, 76, 80, np.nan, 83], 'StudyHours': [5, 6, 8, 3, 4, 7, 9, 5.5, 4.5, 5, 2, 6], 'SleepHours': [7, 6.5, 7.5, 8, 7, 6, 7, 7.5, 8, 6.5, 9, 7]} df = pd.DataFrame(data) print(df)This creates a DataFrame df with scores, study hours, and sleep hours for different students, including one missing exam score (np.nan).The .describe() Method: A Quick OverviewOften, the first step in exploring a numerical dataset with Pandas is the .describe() method. It provides a concise summary of the central tendency, dispersion, and shape of the distribution for each numerical column in a DataFrame (or for a single Series).# Get summary statistics for numerical columns summary_stats = df.describe() print(summary_stats)Running this will produce output similar to: ExamScore StudyHours SleepHours count 11.000000 12.000000 12.000000 mean 81.363636 5.833333 7.208333 std 8.737508 1.991495 0.793394 min 65.000000 2.000000 6.000000 25% 77.000000 4.500000 6.500000 50% 81.000000 5.750000 7.000000 75% 86.500000 7.000000 7.500000 max 95.000000 9.000000 9.000000Notice a few things:count: Shows the number of non-missing values. ExamScore has 11, reflecting the np.nan.mean: The average value.std: The standard deviation, measuring spread.min, max: Minimum and maximum values.25%, 50%, 75%: These are the quartiles (percentiles). The 50th percentile is the median.The .describe() method is excellent for getting a quick feel for your data's distribution and scale.Calculating Individual StatisticsWhile .describe() is convenient, you'll often need specific statistics. Pandas provides dedicated methods for these. You can apply them to a whole DataFrame (calculating the statistic for each column) or a single Series (a specific column).Measures of Central Tendency# Calculate mean for each column means = df.mean() print("Means:\n", means) # Calculate median for the 'ExamScore' column median_score = df['ExamScore'].median() print(f"\nMedian Exam Score: {median_score}") # Calculate mode for 'SleepHours' # Mode can return multiple values if they have the same highest frequency modes_sleep = df['SleepHours'].mode() print("\nMode(s) for Sleep Hours:\n", modes_sleep)These functions (.mean(), .median(), .mode()) automatically handle missing values by default (controlled by the skipna=True parameter).Measures of Dispersion and Position# Calculate variance for each column variances = df.var() print("Variances:\n", variances) # Calculate standard deviation for 'StudyHours' std_study = df['StudyHours'].std() print(f"\nStandard Deviation Study Hours: {std_study:.4f}") # Calculate minimum and maximum values min_values = df.min() max_values = df.max() print("\nMinimum Values:\n", min_values) print("\nMaximum Values:\n", max_values) # Calculate specific percentiles (e.g., 10th and 90th) for 'ExamScore' p10 = df['ExamScore'].quantile(0.10) p90 = df['ExamScore'].quantile(0.90) print(f"\n10th Percentile Exam Score: {p10}") print(f"90th Percentile Exam Score: {p90}") # Calculate the Interquartile Range (IQR) for 'ExamScore' q1 = df['ExamScore'].quantile(0.25) q3 = df['ExamScore'].quantile(0.75) iqr = q3 - q1 print(f"IQR for Exam Score: {iqr}")The .quantile(q) method is versatile for finding any percentile, where $q$ is between 0 and 1. The range can be calculated simply by subtracting the result of .min() from .max().Measures of ShapeSkewness and kurtosis tell you about the asymmetry and peakedness of the distribution, respectively.# Calculate skewness for each column skewness = df.skew() print("Skewness:\n", skewness) # Calculate kurtosis for 'StudyHours' kurt_study = df['StudyHours'].kurt() # Fisher's definition (normal dist = 0) # kurt_study = df['StudyHours'].kurtosis() # Same as .kurt() print(f"\nKurtosis for Study Hours: {kurt_study:.4f}")Positive skewness indicates a tail extending towards higher values, while negative skewness indicates a tail towards lower values. Kurtosis measures the "tailedness"; higher kurtosis means more outliers or heavier tails compared to a normal distribution.Correlation AnalysisTo understand the linear relationship between pairs of numerical variables, use the .corr() method on the DataFrame.# Calculate the pairwise correlation between columns correlation_matrix = df.corr() print("\nCorrelation Matrix:\n", correlation_matrix)This outputs a correlation matrix where each cell $(i, j)$ contains the Pearson correlation coefficient between column $i$ and column $j$. The diagonal elements are always 1 (correlation of a variable with itself). ExamScore StudyHours SleepHours ExamScore 1.000000 0.946434 -0.394137 StudyHours 0.946434 1.000000 -0.417979 SleepHours -0.394137 -0.417979 1.000000From this, we see a strong positive correlation ($0.95$) between ExamScore and StudyHours, suggesting students who study more tend to get higher scores. There's a moderate negative correlation between StudyHours and SleepHours, perhaps indicating that more study time might correlate with slightly less sleep in this sample. Remember, correlation does not imply causation!Visualizing Descriptive StatisticsWhile numerical summaries are essential, visualizing the data often provides deeper insights. Pandas integrates with Matplotlib, allowing for quick plots directly from DataFrames or Series. For more customized or advanced plots, libraries like Seaborn or Plotly are commonly used.Here's how you might quickly visualize the distribution of ExamScore using Plotly after calculating the statistics:{"layout": {"title": "Distribution of Exam Scores", "xaxis": {"title": "Exam Score"}, "yaxis": {"title": "Frequency"}, "bargap": 0.1, "template": "plotly_white"}, "data": [{"type": "histogram", "x": [78, 85, 92, 65, 72, 88, 95, 81, 76, 80, 83], "marker": {"color": "#228be6"}}]}Histogram showing the frequency distribution of exam scores in the sample data.This histogram complements the numerical statistics (like mean, median, skewness) by showing the shape of the score distribution visually.Using Pandas effectively allows you to move quickly from raw data to meaningful statistical summaries, forming a basis for further analysis, visualization, and model building in machine learning workflows.