Descriptive statistics are applied to a sample dataset using Python to gain initial understanding before exploring more complex modeling. The Pandas library, indispensable for data manipulation and analysis in Python, is utilized to perform data summarization.Setting Up Your EnvironmentFirst, ensure you have the necessary libraries installed. If not, you can typically install them using pip:pip install pandas numpy scipy matplotlib seaborn plotlyNow, let's import the libraries we'll use in our Python script or Jupyter Notebook:import pandas as pd import numpy as np import scipy.stats as stats import plotly.express as px import plotly.graph_objects as goLoading and Inspecting the DataFor this exercise, we'll work with a simulated dataset representing some measurements. Imagine these could be sensor readings, user activity metrics, or any set of observations you might encounter. Let's create this dataset directly using Pandas and NumPy.# Seed for reproducibility np.random.seed(42) # Generate data n_samples = 150 data = { 'feature_A': np.random.normal(loc=50, scale=15, size=n_samples), 'feature_B': np.random.gamma(shape=2, scale=10, size=n_samples) + 20, # Skewed 'feature_C': 0.7 * np.random.normal(loc=50, scale=15, size=n_samples) + np.random.normal(loc=0, scale=5, size=n_samples) + 10, # Correlated with A 'category': np.random.choice(['Type X', 'Type Y', 'Type Z'], size=n_samples, p=[0.4, 0.35, 0.25]) } df = pd.DataFrame(data) # Ensure no negative values for typical features df['feature_A'] = df['feature_A'].clip(lower=0) df['feature_B'] = df['feature_B'].clip(lower=0) df['feature_C'] = df['feature_C'].clip(lower=0) print("Dataset dimensions:", df.shape) print("\nFirst 5 rows:") print(df.head()) print("\nData types and non-null counts:") df.info()The output from df.head() gives us a quick look at the first few rows, while df.info() tells us the number of entries, the column names, the count of non-null values per column, and the data type of each column. We see 150 entries and 4 columns, with no missing values in this case. feature_A, feature_B, and feature_C are numerical (float64), and category is categorical (object).Overall Summary StatisticsThe .describe() method in Pandas is excellent for getting a quick statistical summary of the numerical columns.# Get summary statistics for numerical columns summary_stats = df.describe() print("\nSummary Statistics:") print(summary_stats)This output provides several important statistics we've discussed:count: The number of non-missing observations.mean: The average value.std: The standard deviation, measuring spread.min: The minimum value.25%: The first quartile (Q1).50%: The median (second quartile, Q2).75%: The third quartile (Q3).max: The maximum value.Looking at the output, we can already make some observations:feature_A has a mean around 50, close to its median, suggesting a relatively symmetric distribution. Its standard deviation is about 14.feature_B has a mean (around 40) noticeably larger than its median (around 36). This suggests a right-skewed distribution. The range (max - min) is quite large compared to feature_A.feature_C has characteristics somewhat similar to feature_A, with a mean close to its median.Calculating Specific MeasuresWhile .describe() is useful, sometimes we need specific statistics or want to calculate them individually for clarity or for non-numerical columns (like the mode).Central TendencyLet's calculate the mean, median, and mode for feature_B, which seemed skewed.# Central Tendency for feature_B mean_b = df['feature_B'].mean() median_b = df['feature_B'].median() mode_b = df['feature_B'].mode() # Mode can return multiple values if they have the same highest frequency print(f"\nFeature B - Mean: {mean_b:.2f}") print(f"Feature B - Median: {median_b:.2f}") print(f"Feature B - Mode(s): {mode_b.tolist()}") # Display modes as a list # Mode for the categorical column mode_category = df['category'].mode() print(f"\nCategory - Mode(s): {mode_category.tolist()}")As expected, the mean of feature_B is pulled higher than the median due to the right skew. The mode represents the most frequent value(s). For the categorical feature, 'Type X' is the most common category.DispersionLet's examine the spread of feature_A and feature_B.# Dispersion for feature_A variance_a = df['feature_A'].var() std_dev_a = df['feature_A'].std() range_a = df['feature_A'].max() - df['feature_A'].min() iqr_a = df['feature_A'].quantile(0.75) - df['feature_A'].quantile(0.25) print(f"\nFeature A - Variance: {variance_a:.2f}") print(f"Feature A - Standard Deviation: {std_dev_a:.2f}") print(f"Feature A - Range: {range_a:.2f}") print(f"Feature A - Interquartile Range (IQR): {iqr_a:.2f}") # Dispersion for feature_B variance_b = df['feature_B'].var() std_dev_b = df['feature_B'].std() range_b = df['feature_B'].max() - df['feature_B'].min() iqr_b = df['feature_B'].quantile(0.75) - df['feature_B'].quantile(0.25) print(f"\nFeature B - Variance: {variance_b:.2f}") print(f"Feature B - Standard Deviation: {std_dev_b:.2f}") print(f"Feature B - Range: {range_b:.2f}") print(f"Feature B - Interquartile Range (IQR): {iqr_b:.2f}")Comparing the standard deviations ($14.06$ for A vs. $14.49$ for B) doesn't immediately reveal the difference in shape, but comparing the range ($64.65$ vs. $76.56$) and IQR ($17.50$ vs. $16.26$) starts to hint at feature_B having more extreme values on one side (the right side, as indicated by the skew). The IQR is often more robust to outliers than the range or standard deviation.Shape: Skewness and KurtosisLet's quantify the shape using skewness and kurtosis. We can use the scipy.stats module.# Shape for feature_A skew_a = stats.skew(df['feature_A']) kurt_a = stats.kurtosis(df['feature_A']) # Fisher’s definition (normal == 0) print(f"\nFeature A - Skewness: {skew_a:.2f}") print(f"Feature A - Kurtosis: {kurt_a:.2f}") # Shape for feature_B skew_b = stats.skew(df['feature_B']) kurt_b = stats.kurtosis(df['feature_B']) print(f"\nFeature B - Skewness: {skew_b:.2f}") print(f"Feature B - Kurtosis: {kurt_b:.2f}")feature_A has a skewness close to 0 (-0.12), confirming its relative symmetry. Kurtosis is also near 0 (-0.22), suggesting a peak similar to a normal distribution.feature_B has a positive skewness (1.02), confirming our earlier observation of a right skew (tail extends to the right). The positive kurtosis (1.15) indicates slightly heavier tails and a sharper peak compared to a normal distribution.Correlation AnalysisNow, let's examine the linear relationships between the numerical features.# Calculate the correlation matrix correlation_matrix = df[['feature_A', 'feature_B', 'feature_C']].corr() print("\nCorrelation Matrix:") print(correlation_matrix)The correlation matrix shows the Pearson correlation coefficient between each pair of variables.The diagonal elements are always 1 (correlation of a variable with itself).We see a strong positive correlation (around 0.70) between feature_A and feature_C, as expected from our data generation process.feature_B shows weak correlations with feature_A (around 0.05) and feature_C (around 0.09).Remember, correlation measures linear association. A low correlation doesn't necessarily mean no relationship exists, just not a linear one. And critically, correlation does not imply causation. Even though A and C are correlated, we cannot conclude that A causes C or vice-versa based solely on this value.Visualizing the SummariesNumerical summaries are powerful, but visualizations often provide more intuitive understanding.HistogramsHistograms help visualize the distribution of a single numerical variable.# Histogram for feature_A fig_hist_a = px.histogram(df, x='feature_A', nbins=20, title='Distribution of Feature A', color_discrete_sequence=['#339af0']) # Blue fig_hist_a.update_layout(bargap=0.1) fig_hist_a.show() # Histogram for feature_B fig_hist_b = px.histogram(df, x='feature_B', nbins=20, title='Distribution of Feature B', color_discrete_sequence=['#20c997']) # Teal fig_hist_b.update_layout(bargap=0.1) fig_hist_b.show(){"layout": {"title": {"text": "Distribution of Feature A"}, "bargap": 0.1, "xaxis": {"anchor": "y", "domain": [0.0, 1.0], "title": {"text": "feature_A"}}, "yaxis": {"anchor": "x", "domain": [0.0, 1.0], "title": {"text": "count"}}, "legend": {"tracegroupgap": 0}, "colorscale": {"sequential": [[0.0, "#4263eb"], [0.1111111111111111, "#4c6ef5"], [0.2222222222222222, "#5c7cfa"], [0.3333333333333333, "#748ffc"], [0.4444444444444444, "#91a7ff"], [0.5555555555555556, "#bac8ff"], [0.6666666666666666, "#d0bfff"], [0.7777777777777778, "#e599f7"], [0.8888888888888888, "#f06595"], [1.0, "#fa5252"]], "sequentialminus": [[0.0, "#fa5252"], [0.1111111111111111, "#f06595"], [0.2222222222222222, "#e599f7"], [0.3333333333333333, "#d0bfff"], [0.4444444444444444, "#bac8ff"], [0.5555555555555556, "#91a7ff"], [0.6666666666666666, "#748ffc"], [0.7777777777777778, "#5c7cfa"], [0.8888888888888888, "#4c6ef5"], [1.0, "#4263eb"]]}, "coloraxis": {"colorbar": {"title": {"text": ""}}}, "xaxis_title_text": "feature_A", "yaxis_title_text": "count"}, "data": [{"type": "histogram", "x": [57.45, 72.67, 41.46, 41.39, 68.66, 50.68, 46.96, 40.59, 48.48, 31.53, 46.11, 50.59, 52.31, 28.19, 60.27, 37.33, 53.79, 67.92, 29.22, 40.72, 56.5 , 46.84, 57.34, 47.77, 33.13, 35.89, 56.45, 39.71, 55.53, 40.4 , 54.23, 46.05, 51.93, 68.02, 44.07, 69.76, 66.6 , 60.91, 47.17, 62.4 , 36.39, 54.19, 31.81, 60.96, 45.62, 36.58, 57.64, 60.18, 28.3 , 46.98, 64.87, 46.04, 29.89, 54.71, 41.59, 37.6 , 48.86, 56.75, 44.51, 75.75, 43.26, 51.66, 57.37, 51.98, 46.7 , 45.89, 61.31, 37.82, 36.9 , 54.37, 50.95, 38.66, 31.57, 42.96, 51.66, 38.14, 48.42, 51.23, 34.75, 51.46, 39.97, 45.94, 52.05, 62.32, 43.64, 58.1 , 61.4 , 51.36, 48.16, 53.58, 51.83, 50.8 , 39.68, 48.09, 59.17, 49.31, 32.5 , 58.79, 55.16, 47.7 , 62.59, 48.03, 48.78, 46.73, 47.87, 56.5 , 56.38, 44.61, 59.51, 52.25, 41.81, 48.52, 52.09, 53.77, 61.09, 38.9 , 60.51, 39.4 , 44.23, 37.45, 43.07, 53.92, 47.4 , 49.34, 50.28, 31.34, 32.19, 42.54, 61.74, 31.59, 47.28, 48.61, 53.66, 48.82, 45.5 , 47.7 , 53.54, 40.67, 44.22, 71.28, 47.65, 49.15, 60.73, 53.69, 51.12], "marker": {"color": "#339af0", "pattern": {"shape": ""}}, "nbinsx": 20, "autobinx": false, "name": "count", "showlegend": false, "xaxis": "x", "yaxis": "y"}]}Histogram of feature_A shows a roughly bell-shaped, symmetric distribution centered near 50.{"layout": {"title": {"text": "Distribution of Feature B"}, "bargap": 0.1, "xaxis": {"anchor": "y", "domain": [0.0, 1.0], "title": {"text": "feature_B"}}, "yaxis": {"anchor": "x", "domain": [0.0, 1.0], "title": {"text": "count"}}, "legend": {"tracegroupgap": 0}, "colorscale": {"sequential": [[0.0, "#0ca678"], [0.1111111111111111, "#12b886"], [0.2222222222222222, "#20c997"], [0.3333333333333333, "#38d9a9"], [0.4444444444444444, "#63e6be"], [0.5555555555555556, "#96f2d7"], [0.6666666666666666, "#b2f2bb"], [0.7777777777777778, "#c0eb75"], [0.8888888888888888, "#fab005"], [1.0, "#fd7e14"]], "sequentialminus": [[0.0, "#fd7e14"], [0.1111111111111111, "#fab005"], [0.2222222222222222, "#c0eb75"], [0.3333333333333333, "#b2f2bb"], [0.4444444444444444, "#96f2d7"], [0.5555555555555556, "#63e6be"], [0.6666666666666666, "#38d9a9"], [0.7777777777777778, "#20c997"], [0.8888888888888888, "#12b886"], [1.0, "#0ca678"]]}, "coloraxis": {"colorbar": {"title": {"text": ""}}}, "xaxis_title_text": "feature_B", "yaxis_title_text": "count"}, "data": [{"type": "histogram", "x": [33.26, 42.31, 28.52, 29.4 , 23.84, 57.44, 46.97, 36.07, 33.29, 41.75, 41.29, 31.29, 31.8 , 32.4 , 21.89, 40.18, 41.09, 49.44, 24.89, 21.37, 26.66, 25.42, 36.79, 34.16, 32.36, 43.97, 34.68, 35.93, 27.85, 31.46, 46.05, 24.8 , 47.79, 57.09, 20.82, 51.06, 32.69, 25.62, 28.44, 42.3 , 51.21, 48.72, 48.5 , 56.82, 47.11, 24.67, 30.9 , 55.94, 45.16, 49.87, 49.03, 27.11, 29.91, 31.45, 36.33, 29.87, 34.21, 42.35, 25.78, 54.96, 45.97, 45.3 , 54.15, 36.85, 35.85, 32.98, 25.71, 30.64, 48.41, 48.62, 33.32, 21.7 , 48.29, 43.33, 31.91, 37.29, 27.35, 29.87, 23.64, 20.8 , 39.93, 21.51, 58.52, 35.3 , 35.78, 32.24, 38.61, 59.21, 38.89, 25.34, 35.7 , 30.11, 35.92, 27.92, 43.9 , 29.5 , 40.53, 50.33, 32.98, 42.45, 26.65, 42.33, 29.53, 28.33, 39.89, 36.97, 38.71, 37.81, 22.36, 56.97, 34.37, 36.89, 31.95, 65.3 , 65.86, 34.28, 39.31, 55.74, 20.31, 32.18, 96.87, 49.47, 33.9 , 43.25, 37.08, 44.68, 36.18, 38.88, 25.58, 29.29, 41.64, 37.6 , 36.78, 22.36, 30.05, 25.57, 47.36, 26.24, 25.57, 33.81, 40.41, 21.16, 45.09, 36.45, 47.54, 34.26, 22.11, 39.09, 58.37, 36.38, 44.78], "marker": {"color": "#20c997", "pattern": {"shape": ""}}, "nbinsx": 20, "autobinx": false, "name": "count", "showlegend": false, "xaxis": "x", "yaxis": "y"}]}Histogram of feature_B clearly shows the right skew, with most values clustered on the left and a tail extending towards higher values.Box PlotsBox plots are excellent for comparing distributions or summarizing a single distribution's quartiles, median, and potential outliers.# Box plot for all numerical features fig_box = px.box(df, y=['feature_A', 'feature_B', 'feature_C'], title='Box Plots of Numerical Features', color_discrete_sequence=['#339af0', '#20c997', '#7048e8']) # Blue, Teal, Violet fig_box.show(){"layout": {"title": {"text": "Box Plots of Numerical Features"}, "xaxis": {"anchor": "y", "domain": [0.0, 1.0], "title": {"text": ""}}, "yaxis": {"anchor": "x", "domain": [0.0, 1.0], "title": {"text": ""}}, "legend": {"tracegroupgap": 0}, "boxmode": "group", "colorscale": {"sequential": [[0.0, "#4263eb"], [0.1111111111111111, "#4c6ef5"], [0.2222222222222222, "#5c7cfa"], [0.3333333333333333, "#748ffc"], [0.4444444444444444, "#91a7ff"], [0.5555555555555556, "#bac8ff"], [0.6666666666666666, "#d0bfff"], [0.7777777777777778, "#e599f7"], [0.8888888888888888, "#f06595"], [1.0, "#fa5252"]], "sequentialminus": [[0.0, "#fa5252"], [0.1111111111111111, "#f06595"], [0.2222222222222222, "#e599f7"], [0.3333333333333333, "#d0bfff"], [0.4444444444444444, "#bac8ff"], [0.5555555555555556, "#91a7ff"], [0.6666666666666666, "#748ffc"], [0.7777777777777778, "#5c7cfa"], [0.8888888888888888, "#4c6ef5"], [1.0, "#4263eb"]]}, "coloraxis": {"colorbar": {"title": {"text": ""}}}, "yaxis_title_text": ""}, "data": [{"type": "box", "y": [57.45, 72.67, 41.46, 41.39, 68.66, 50.68, 46.96, 40.59, 48.48, 31.53, 46.11, 50.59, 52.31, 28.19, 60.27, 37.33, 53.79, 67.92, 29.22, 40.72, 56.5 , 46.84, 57.34, 47.77, 33.13, 35.89, 56.45, 39.71, 55.53, 40.4 , 54.23, 46.05, 51.93, 68.02, 44.07, 69.76, 66.6 , 60.91, 47.17, 62.4 , 36.39, 54.19, 31.81, 60.96, 45.62, 36.58, 57.64, 60.18, 28.3 , 46.98, 64.87, 46.04, 29.89, 54.71, 41.59, 37.6 , 48.86, 56.75, 44.51, 75.75, 43.26, 51.66, 57.37, 51.98, 46.7 , 45.89, 61.31, 37.82, 36.9 , 54.37, 50.95, 38.66, 31.57, 42.96, 51.66, 38.14, 48.42, 51.23, 34.75, 51.46, 39.97, 45.94, 52.05, 62.32, 43.64, 58.1 , 61.4 , 51.36, 48.16, 53.58, 51.83, 50.8 , 39.68, 48.09, 59.17, 49.31, 32.5 , 58.79, 55.16, 47.7 , 62.59, 48.03, 48.78, 46.73, 47.87, 56.5 , 56.38, 44.61, 59.51, 52.25, 41.81, 48.52, 52.09, 53.77, 61.09, 38.9 , 60.51, 39.4 , 44.23, 37.45, 43.07, 53.92, 47.4 , 49.34, 50.28, 31.34, 32.19, 42.54, 61.74, 31.59, 47.28, 48.61, 53.66, 48.82, 45.5 , 47.7 , 53.54, 40.67, 44.22, 71.28, 47.65, 49.15, 60.73, 53.69, 51.12], "name": "feature_A", "boxpoints": "outliers", "marker": {"color": "#339af0"}, "xaxis": "x", "yaxis": "y", "showlegend": false}, {"type": "box", "y": [33.26, 42.31, 28.52, 29.4 , 23.84, 57.44, 46.97, 36.07, 33.29, 41.75, 41.29, 31.29, 31.8 , 32.4 , 21.89, 40.18, 41.09, 49.44, 24.89, 21.37, 26.66, 25.42, 36.79, 34.16, 32.36, 43.97, 34.68, 35.93, 27.85, 31.46, 46.05, 24.8 , 47.79, 57.09, 20.82, 51.06, 32.69, 25.62, 28.44, 42.3 , 51.21, 48.72, 48.5 , 56.82, 47.11, 24.67, 30.9 , 55.94, 45.16, 49.87, 49.03, 27.11, 29.91, 31.45, 36.33, 29.87, 34.21, 42.35, 25.78, 54.96, 45.97, 45.3 , 54.15, 36.85, 35.85, 32.98, 25.71, 30.64, 48.41, 48.62, 33.32, 21.7 , 48.29, 43.33, 31.91, 37.29, 27.35, 29.87, 23.64, 20.8 , 39.93, 21.51, 58.52, 35.3 , 35.78, 32.24, 38.61, 59.21, 38.89, 25.34, 35.7 , 30.11, 35.92, 27.92, 43.9 , 29.5 , 40.53, 50.33, 32.98, 42.45, 26.65, 42.33, 29.53, 28.33, 39.89, 36.97, 38.71, 37.81, 22.36, 56.97, 34.37, 36.89, 31.95, 65.3 , 65.86, 34.28, 39.31, 55.74, 20.31, 32.18, 96.87, 49.47, 33.9 , 43.25, 37.08, 44.68, 36.18, 38.88, 25.58, 29.29, 41.64, 37.6 , 36.78, 22.36, 30.05, 25.57, 47.36, 26.24, 25.57, 33.81, 40.41, 21.16, 45.09, 36.45, 47.54, 34.26, 22.11, 39.09, 58.37, 36.38, 44.78], "name": "feature_B", "boxpoints": "outliers", "marker": {"color": "#20c997"}, "xaxis": "x", "yaxis": "y", "showlegend": false}, {"type": "box", "y": [51.68, 60.95, 37.15, 38.76, 59.99, 43.78, 40.75, 39.45, 43.3 , 31.09, 39.63, 47.38, 49.05, 29.66, 47.95, 36.86, 46.24, 62.28, 24.73, 41.4 , 50.76, 41.77, 54.64, 42.59, 32.1 , 35.63, 46.86, 35.55, 54.2 , 32.73, 47.41, 38.48, 47.01, 54.44, 38.99, 56.25, 57.68, 49.12, 45.87, 54.55, 31.44, 43.32, 30.22, 53.95, 38.1 , 34.34, 52.09, 51.31, 29.91, 44.09, 50.57, 46.15, 31.07, 48.52, 36.04, 39.66, 46.12, 46.74, 39.79, 62.76, 39.77, 46.45, 46.84, 43.93, 43.37, 38.81, 47.46, 36.29, 37.36, 47.94, 48.81, 33.49, 30.38, 36.63, 43.65, 32.23, 42.34, 42.96, 31.68, 49.3 , 35.74, 43.01, 47.14, 47.03, 41.62, 48.47, 51.64, 44.32, 45.03, 49.36, 48.29, 43.59, 33.82, 43.63, 50.18, 45.11, 36.02, 49.66, 46.66, 44.64, 50.99, 43.38, 43.66, 40.77, 44.01, 47.06, 53.53, 42.13, 49.19, 44.83, 37.96, 43.1 , 45.91, 45.15, 51.49, 35.51, 56.37, 37.94, 36.86, 34.05, 37.56, 47.7 , 45.02, 44.5 , 43.9 , 31.37, 35.66, 35.99, 55.8 , 33.33, 40.68, 43.53, 49.99, 44.11, 39.29, 45.68, 44.97, 39.44, 38.08, 56.05, 39.08, 46.6 , 48.38, 44.2 , 46.13], "name": "feature_C", "boxpoints": "outliers", "marker": {"color": "#7048e8"}, "xaxis": "x", "yaxis": "y", "showlegend": false}]}Box plots visually compare the median (line inside the box), IQR (the box itself), and range (whiskers) of the features. Outliers may be plotted as individual points. Note the higher median and longer upper whisker/outliers for feature_B, indicating the right skew.Scatter PlotsScatter plots help visualize the relationship between two numerical variables. Let's plot the highly correlated feature_A and feature_C.# Scatter plot for feature_A vs feature_C fig_scatter = px.scatter(df, x='feature_A', y='feature_C', title='Feature A vs Feature C', trendline='ols', # Add Ordinary Least Squares regression line color_discrete_sequence=['#be4bdb']) # Grape fig_scatter.show(){"layout": {"title": {"text": "Feature A vs Feature C"}, "xaxis": {"anchor": "y", "domain": [0.0, 1.0], "title": {"text": "feature_A"}}, "yaxis": {"anchor": "x", "domain": [0.0, 1.0], "title": {"text": "feature_C"}}, "legend": {"tracegroupgap": 0, "itemsizing": "constant"}, "colorscale": {"sequential": [[0.0, "#4263eb"], [0.1111111111111111, "#4c6ef5"], [0.2222222222222222, "#5c7cfa"], [0.3333333333333333, "#748ffc"], [0.4444444444444444, "#91a7ff"], [0.5555555555555556, "#bac8ff"], [0.6666666666666666, "#d0bfff"], [0.7777777777777778, "#e599f7"], [0.8888888888888888, "#f06595"], [1.0, "#fa5252"]], "sequentialminus": [[0.0, "#fa5252"], [0.1111111111111111, "#f06595"], [0.2222222222222222, "#e599f7"], [0.3333333333333333, "#d0bfff"], [0.4444444444444444, "#bac8ff"], [0.5555555555555556, "#91a7ff"], [0.6666666666666666, "#748ffc"], [0.7777777777777778, "#5c7cfa"], [0.8888888888888888, "#4c6ef5"], [1.0, "#4263eb"]]}, "coloraxis": {"colorbar": {"title": {"text": ""}}}, "xaxis_title_text": "feature_A", "yaxis_title_text": "feature_C"}, "data": [{"type": "scatter", "x": [57.45, 72.67, 41.46, 41.39, 68.66, 50.68, 46.96, 40.59, 48.48, 31.53, 46.11, 50.59, 52.31, 28.19, 60.27, 37.33, 53.79, 67.92, 29.22, 40.72, 56.5 , 46.84, 57.34, 47.77, 33.13, 35.89, 56.45, 39.71, 55.53, 40.4 , 54.23, 46.05, 51.93, 68.02, 44.07, 69.76, 66.6 , 60.91, 47.17, 62.4 , 36.39, 54.19, 31.81, 60.96, 45.62, 36.58, 57.64, 60.18, 28.3 , 46.98, 64.87, 46.04, 29.89, 54.71, 41.59, 37.6 , 48.86, 56.75, 44.51, 75.75, 43.26, 51.66, 57.37, 51.98, 46.7 , 45.89, 61.31, 37.82, 36.9 , 54.37, 50.95, 38.66, 31.57, 42.96, 51.66, 38.14, 48.42, 51.23, 34.75, 51.46, 39.97, 45.94, 52.05, 62.32, 43.64, 58.1 , 61.4 , 51.36, 48.16, 53.58, 51.83, 50.8 , 39.68, 48.09, 59.17, 49.31, 32.5 , 58.79, 55.16, 47.7 , 62.59, 48.03, 48.78, 46.73, 47.87, 56.5 , 56.38, 44.61, 59.51, 52.25, 41.81, 48.52, 52.09, 53.77, 61.09, 38.9 , 60.51, 39.4 , 44.23, 37.45, 43.07, 53.92, 47.4 , 49.34, 50.28, 31.34, 32.19, 42.54, 61.74, 31.59, 47.28, 48.61, 53.66, 48.82, 45.5 , 47.7 , 53.54, 40.67, 44.22, 71.28, 47.65, 49.15, 60.73, 53.69, 51.12], "y": [51.68, 60.95, 37.15, 38.76, 59.99, 43.78, 40.75, 39.45, 43.3 , 31.09, 39.63, 47.38, 49.05, 29.66, 47.95, 36.86, 46.24, 62.28, 24.73, 41.4 , 50.76, 41.77, 54.64, 42.59, 32.1 , 35.63, 46.86, 35.55, 54.2 , 32.73, 47.41, 38.48, 47.01, 54.44, 38.99, 56.25, 57.68, 49.12, 45.87, 54.55, 31.44, 43.32, 30.22, 53.95, 38.1 , 34.34, 52.09, 51.31, 29.91, 44.09, 50.57, 46.15, 31.07, 48.52, 36.04, 39.66, 46.12, 46.74, 39.79, 62.76, 39.77, 46.45, 46.84, 43.93, 43.37, 38.81, 47.46, 36.29, 37.36, 47.94, 48.81, 33.49, 30.38, 36.63, 43.65, 32.23, 42.34, 42.96, 31.68, 49.3 , 35.74, 43.01, 47.14, 47.03, 41.62, 48.47, 51.64, 44.32, 45.03, 49.36, 48.29, 43.59, 33.82, 43.63, 50.18, 45.11, 36.02, 49.66, 46.66, 44.64, 50.99, 43.38, 43.66, 40.77, 44.01, 47.06, 53.53, 42.13, 49.19, 44.83, 37.96, 43.1 , 45.91, 45.15, 51.49, 35.51, 56.37, 37.94, 36.86, 34.05, 37.56, 47.7 , 45.02, 44.5 , 43.9 , 31.37, 35.66, 35.99, 55.8 , 33.33, 40.68, 43.53, 49.99, 44.11, 39.29, 45.68, 44.97, 39.44, 38.08, 56.05, 39.08, 46.6 , 48.38, 44.2 , 46.13], "mode": "markers", "marker": {"color": "#be4bdb", "symbol": "circle"}, "name": "", "showlegend": false, "xaxis": "x", "yaxis": "y"}, {"type": "scattergl", "x": [28.19, 75.75], "y": [31.85, 58.9], "mode": "lines", "line": {"color": "#adb5bd", "width": 2, "dash": "solid"}, "marker": {"symbol": "circle"}, "name": "Linear trend", "showlegend": false, "xaxis": "x", "yaxis": "y"}]}Scatter plot showing a positive linear trend between feature_A and feature_C, confirming the correlation coefficient calculated earlier. The points cluster around the regression line.ConclusionIn this practice section, we applied the descriptive statistics concepts from this chapter to a sample dataset. Using Pandas, SciPy, and Plotly, we calculated measures of central tendency, dispersion, and shape, computed correlations, and created visualizations like histograms, box plots, and scatter plots.This process of summarizing a dataset is a fundamental first step in any data analysis or machine learning project. It helps you:Understand the basic properties and distributions of your variables.Identify potential issues like skewness or outliers.Discover relationships between variables.Communicate important characteristics of the data effectively.Armed with these summary insights, you are better prepared to choose appropriate data preprocessing techniques, select suitable machine learning models, and interpret the results of further analyses.