Using Python's Pandas library, a small dataset is loaded to calculate main descriptive statistics and create visualizations, aiding in understanding the data's characteristics.Imagine we have collected data on the daily temperatures (in Celsius) recorded over two weeks for a particular city.Our Sample Dataset:Daily Temperatures (°C): [22, 25, 19, 21, 24, 26, 23, 20, 22, 25, 28, 24, 21, 23]1. Setting Up and Loading DataFirst, ensure you have Pandas installed (pip install pandas). We'll use it to manage and analyze our data efficiently. Let's load our temperature data into a Pandas Series, which is like a single column of data.import pandas as pd import numpy as np # Often useful alongside Pandas # Our temperature data temperatures_c = [22, 25, 19, 21, 24, 26, 23, 20, 22, 25, 28, 24, 21, 23] # Create a Pandas Series temp_series = pd.Series(temperatures_c, name="Daily Temperature (C)") # Display the series print(temp_series)2. Calculating Measures of Central TendencyNow, let's find the typical temperature using mean, median, and mode. Pandas provides straightforward methods for this.# Calculate Mean mean_temp = temp_series.mean() print(f"Mean Temperature: {mean_temp:.2f} °C") # Calculate Median median_temp = temp_series.median() print(f"Median Temperature: {median_temp:.2f} °C") # Calculate Mode # Note: .mode() returns a Series as there can be multiple modes. # We'll take the first one if it exists. mode_temp = temp_series.mode() if not mode_temp.empty: print(f"Mode Temperature(s): {list(mode_temp)} °C") else: print("No unique mode found.")Interpretation: The mean gives the average temperature ($ \approx 23.07^\circ C $). The median ($ 23.0^\circ C $) is the middle value when the data is sorted, less affected by extreme values. The modes ($ [21, 22, 23, 24, 25]^\circ C $) are the most frequently occurring temperatures. In this case, several temperatures appear twice, indicating a relatively flat distribution in the central part. The mean and median are very close, suggesting the data distribution is fairly symmetric.3. Calculating Measures of DispersionHow much do the temperatures vary day-to-day? Let's calculate the range, variance, and standard deviation.# Calculate Range temp_range = temp_series.max() - temp_series.min() print(f"Temperature Range: {temp_range} °C") # Calculate Variance variance_temp = temp_series.var() # Uses N-1 denominator by default (sample variance) print(f"Temperature Variance: {variance_temp:.2f} °C^2") # Calculate Standard Deviation std_dev_temp = temp_series.std() # Uses N-1 denominator by default (sample standard deviation) print(f"Temperature Standard Deviation: {std_dev_temp:.2f} °C")Interpretation: The range ($ 9^\circ C $) tells us the difference between the hottest and coldest days in our sample. The variance ($ 5.53^\circ C^2 $) and standard deviation ($ \approx 2.35^\circ C $) quantify the average spread of the data points around the mean. A standard deviation of 2.35 suggests that most daily temperatures fall roughly within $ 23.07 \pm 2.35^\circ C $.4. Calculating Percentiles and QuartilesLet's find the quartiles to better understand the data's distribution.# Calculate Quartiles (25th, 50th, 75th percentiles) quartiles = temp_series.quantile([0.25, 0.50, 0.75]) print("\nQuartiles:") print(quartiles) # Calculate the Interquartile Range (IQR) q1 = quartiles[0.25] q3 = quartiles[0.75] iqr = q3 - q1 print(f"\nInterquartile Range (IQR): {iqr:.2f} °C")Interpretation:Q1 (25th percentile) is $ 21.25^\circ C $. 25% of the days had temperatures at or below this value.Q2 (50th percentile) is $ 23.0^\circ C $, which is the same as the median, as expected.Q3 (75th percentile) is $ 24.75^\circ C $. 75% of the days had temperatures at or below this value.The IQR ($ 3.5^\circ C $) represents the spread of the middle 50% of the data.5. Visualizing the DataVisualizations often provide insights that raw numbers alone cannot. Let's create a histogram and a box plot. We'll use Plotly for interactive web-based plots.HistogramA histogram shows the frequency of temperatures falling into different bins.{"data":[{"type":"histogram","x":[22,25,19,21,24,26,23,20,22,25,28,24,21,23],"marker":{"color":"#228be6","line":{"color":"#1c7ed6","width":1}},"name":"Temperatures"}],"layout":{"title":{"text":"Distribution of Daily Temperatures"},"xaxis":{"title":{"text":"Temperature (°C)"}},"yaxis":{"title":{"text":"Frequency (Number of Days)"}},"bargap":0.1,"template":"plotly_white"}}Histogram showing the counts of days within specific temperature ranges.Interpretation: The histogram visually confirms our earlier observations. We see peaks around the 21-25°C range, matching the modes we calculated. The distribution looks roughly bell-shaped, though slightly spread out, consistent with the mean and median being close.Box PlotA box plot provides a compact summary of the distribution, showing the median, quartiles, and potential outliers.{"data":[{"type":"box","y":[22,25,19,21,24,26,23,20,22,25,28,24,21,23],"name":"Temperatures","boxpoints":"all","jitter":0.3,"pointpos":-1.8,"marker":{"color":"#15aabf"},"line":{"color":"#1098ad"}}],"layout":{"title":{"text":"Summary of Daily Temperatures"},"yaxis":{"title":{"text":"Temperature (°C)"}},"template":"plotly_white"}}Box plot illustrating the median (orange line), quartiles (box edges), range (whiskers), and individual data points.Interpretation: The box plot clearly shows the median ($ 23^\circ C $), the IQR (the box itself spans from $ 21.25^\circ C $ to $ 24.75^\circ C $), and the overall range via the whiskers (extending from the minimum $ 19^\circ C $ to the maximum $ 28^\circ C $). The individual points are also plotted. In this case, the whiskers likely extend to the min/max values as there are no points falling far outside the typical range, which would be classified as outliers based on the standard 1.5 * IQR rule.SummaryBy applying these descriptive statistics techniques and visualizations, we've transformed a simple list of temperatures into a meaningful summary. We understand the typical temperature (around $ 23^\circ C $), the variability ($ \approx \pm 2.35^\circ C $ standard deviation), and the overall distribution shape. This process of summarizing data is a fundamental first step in any data analysis or machine learning task. As you encounter larger and more complex datasets, these foundational techniques will remain essential for gaining initial insights.