This exercise applies univariate analysis techniques to a real dataset. It covers how to calculate descriptive statistics and create visualizations for both numerical and categorical variables using Python libraries.Setting the Stage: Loading the DataWe'll use the 'penguins' dataset, which is conveniently available through the Seaborn library. This dataset contains measurements for different penguin species. First, ensure you have Seaborn and Pandas installed and imported. Then, load the dataset:import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt # Load the dataset penguins_df = sns.load_dataset('penguins') # Display basic info and first few rows to understand the data print("Dataset Information:") penguins_df.info() print("\nFirst 5 Rows:") print(penguins_df.head()) # Handle missing values simply for this exercise # (In a real scenario, you'd use the techniques from Chapter 2) penguins_df.dropna(inplace=True) print("\nDataset Information after dropping NaNs:") penguins_df.info()This initial output shows us the columns, their data types (like float64 for numerical measurements, object for categorical strings), and confirms the presence of some missing values which we've removed for simplicity in this practice section.Analyzing a Numerical Variable: flipper_length_mmLet's examine the flipper length of the penguins.Descriptive StatisticsWe can get a quick statistical summary using the .describe() method on this specific column (Series).# Calculate descriptive statistics for flipper_length_mm flipper_stats = penguins_df['flipper_length_mm'].describe() print("\nDescriptive Statistics for Flipper Length (mm):") print(flipper_stats) # Calculate median separately (often useful) flipper_median = penguins_df['flipper_length_mm'].median() print(f"\nMedian Flipper Length: {flipper_median} mm") # Calculate skewness flipper_skew = penguins_df['flipper_length_mm'].skew() print(f"Skewness of Flipper Length: {flipper_skew:.2f}")Interpretation: The output from .describe() provides the count, mean, standard deviation ($ \sigma $), minimum, maximum, and quartile values (25th percentile or Q1, 50th percentile or median, 75th percentile or Q3). The median gives us the central point resistant to outliers. The skewness value (close to 0 indicates approximate symmetry, positive means right-skewed, negative means left-skewed) gives a quick check on the distribution's shape. Here, a skewness close to 0 suggests a fairly symmetric distribution for flipper lengths after handling missing data.Visualizing Distribution: HistogramA histogram is excellent for visualizing the distribution's shape, central tendency, and spread.# Set plot style (optional, for aesthetics) sns.set_style("whitegrid") # Create a histogram for flipper_length_mm plt.figure(figsize=(8, 5)) sns.histplot(data=penguins_df, x='flipper_length_mm', kde=True, bins=15, color='#4dabf7') plt.title('Distribution of Penguin Flipper Lengths') plt.xlabel('Flipper Length (mm)') plt.ylabel('Frequency') plt.show(){"layout": {"title": "Distribution of Penguin Flipper Lengths", "xaxis": {"title": "Flipper Length (mm)"}, "yaxis": {"title": "Frequency"}, "bargap": 0.1, "template": "plotly_white", "width": 600, "height": 400}, "data": [{"type": "histogram", "x": [172.0, 178.0, 178.0, 180.0, 180.0, 181.0, 181.0, 181.0, 182.0, 183.0, 184.0, 185.0, 185.0, 186.0, 186.0, 186.0, 187.0, 187.0, 187.0, 187.0, 188.0, 188.0, 190.0, 190.0, 190.0, 190.0, 190.0, 191.0, 191.0, 192.0, 192.0, 193.0, 193.0, 193.0, 194.0, 195.0, 195.0, 195.0, 196.0, 196.0, 197.0, 197.0, 198.0, 198.0, 199.0, 200.0, 201.0, 202.0, 202.0, 203.0, 205.0, 205.0, 206.0, 207.0, 208.0, 208.0, 209.0, 209.0, 210.0, 210.0, 210.0, 210.0, 210.0, 211.0, 211.0, 212.0, 212.0, 212.0, 213.0, 213.0, 214.0, 214.0, 215.0, 215.0, 215.0, 216.0, 216.0, 217.0, 217.0, 218.0, 218.0, 218.0, 219.0, 219.0, 219.0, 220.0, 220.0, 220.0, 221.0, 221.0, 221.0, 222.0, 222.0, 223.0, 223.0, 224.0, 225.0, 225.0, 228.0, 228.0, 229.0, 230.0, 230.0, 231.0, 180.0, 181.0, 182.0, 183.0, 184.0, 184.0, 185.0, 186.0, 186.0, 186.0, 187.0, 188.0, 189.0, 189.0, 190.0, 190.0, 190.0, 190.0, 191.0, 192.0, 192.0, 193.0, 193.0, 194.0, 195.0, 195.0, 195.0, 196.0, 196.0, 197.0, 197.0, 198.0, 198.0, 199.0, 199.0, 200.0, 200.0, 201.0, 202.0, 203.0, 203.0, 205.0, 205.0, 207.0, 207.0, 208.0, 209.0, 210.0, 210.0, 210.0, 211.0, 211.0, 212.0, 213.0, 214.0, 215.0, 216.0, 217.0, 218.0, 219.0, 221.0, 221.0, 225.0, 230.0, 178.0, 179.0, 180.0, 181.0, 181.0, 182.0, 183.0, 184.0, 185.0, 185.0, 186.0, 186.0, 187.0, 188.0, 189.0, 189.0, 190.0, 190.0, 190.0, 191.0, 191.0, 192.0, 192.0, 193.0, 193.0, 194.0, 195.0, 196.0, 196.0, 197.0, 197.0, 197.0, 198.0, 199.0, 200.0, 201.0, 202.0, 203.0, 204.0, 205.0, 208.0, 209.0, 210.0, 211.0, 212.0, 213.0, 214.0, 215.0, 216.0, 217.0, 218.0, 219.0, 220.0, 221.0, 222.0, 224.0, 185.0, 187.0, 190.0, 192.0, 195.0, 196.0, 198.0, 199.0, 200.0, 201.0, 201.0, 202.0, 203.0, 204.0, 205.0, 206.0, 207.0, 208.0, 208.0, 209.0, 210.0, 210.0, 211.0, 211.0, 212.0, 212.0, 213.0, 213.0, 213.0, 214.0, 214.0, 215.0, 215.0, 216.0, 216.0, 216.0, 217.0, 218.0, 218.0, 219.0, 220.0, 220.0, 221.0, 222.0, 223.0, 224.0, 225.0, 226.0, 227.0, 228.0, 228.0, 230.0, 235.0], "marker": {"color": "#4dabf7"}, "nbinsx": 15}]}Histogram showing the frequency distribution of penguin flipper lengths. The curve (KDE) provides a smooth estimate of the distribution.Interpretation: The histogram visually confirms the near-symmetric, roughly bell-shaped distribution suggested by the skewness value. Most flipper lengths cluster around the center (mean/median). The kde=True argument adds a Kernel Density Estimate curve, providing a smoothed outline of the distribution. bins=15 specifies how many bars to use; experimenting with this number can sometimes reveal different features of the distribution.Visualizing Summary Statistics: Box PlotBox plots are effective for comparing the summary statistics (median, quartiles, range) and identifying potential outliers based on the IQR rule.# Create a box plot for flipper_length_mm plt.figure(figsize=(6, 4)) sns.boxplot(data=penguins_df, x='flipper_length_mm', color='#96f2d7') plt.title('Box Plot of Penguin Flipper Lengths') plt.xlabel('Flipper Length (mm)') plt.show(){"layout": {"title": "Box Plot of Penguin Flipper Lengths", "xaxis": {"title": "Flipper Length (mm)"}, "yaxis": {"showticklabels": false}, "template": "plotly_white", "width": 500, "height": 300}, "data": [{"type": "box", "x": [172.0, 178.0, 178.0, 180.0, 180.0, 181.0, 181.0, 181.0, 182.0, 183.0, 184.0, 185.0, 185.0, 186.0, 186.0, 186.0, 187.0, 187.0, 187.0, 187.0, 188.0, 188.0, 190.0, 190.0, 190.0, 190.0, 190.0, 191.0, 191.0, 192.0, 192.0, 193.0, 193.0, 193.0, 194.0, 195.0, 195.0, 195.0, 196.0, 196.0, 197.0, 197.0, 198.0, 198.0, 199.0, 200.0, 201.0, 202.0, 202.0, 203.0, 205.0, 205.0, 206.0, 207.0, 208.0, 208.0, 209.0, 209.0, 210.0, 210.0, 210.0, 210.0, 210.0, 211.0, 211.0, 212.0, 212.0, 212.0, 213.0, 213.0, 214.0, 214.0, 215.0, 215.0, 215.0, 216.0, 216.0, 217.0, 217.0, 218.0, 218.0, 218.0, 219.0, 219.0, 219.0, 220.0, 220.0, 220.0, 221.0, 221.0, 221.0, 222.0, 222.0, 223.0, 223.0, 224.0, 225.0, 225.0, 228.0, 228.0, 229.0, 230.0, 230.0, 231.0, 180.0, 181.0, 182.0, 183.0, 184.0, 184.0, 185.0, 186.0, 186.0, 186.0, 187.0, 188.0, 189.0, 189.0, 190.0, 190.0, 190.0, 190.0, 191.0, 192.0, 192.0, 193.0, 193.0, 194.0, 195.0, 195.0, 195.0, 196.0, 196.0, 197.0, 197.0, 198.0, 198.0, 199.0, 199.0, 200.0, 200.0, 201.0, 202.0, 203.0, 203.0, 205.0, 205.0, 207.0, 207.0, 208.0, 209.0, 210.0, 210.0, 210.0, 211.0, 211.0, 212.0, 213.0, 214.0, 215.0, 216.0, 217.0, 218.0, 219.0, 221.0, 221.0, 225.0, 230.0, 178.0, 179.0, 180.0, 181.0, 181.0, 182.0, 183.0, 184.0, 185.0, 185.0, 186.0, 186.0, 187.0, 188.0, 189.0, 189.0, 190.0, 190.0, 190.0, 191.0, 191.0, 192.0, 192.0, 193.0, 193.0, 194.0, 195.0, 196.0, 196.0, 197.0, 197.0, 197.0, 198.0, 199.0, 200.0, 201.0, 202.0, 203.0, 204.0, 205.0, 208.0, 209.0, 210.0, 211.0, 212.0, 213.0, 214.0, 215.0, 216.0, 217.0, 218.0, 219.0, 220.0, 221.0, 222.0, 224.0, 185.0, 187.0, 190.0, 192.0, 195.0, 196.0, 198.0, 199.0, 200.0, 201.0, 201.0, 202.0, 203.0, 204.0, 205.0, 206.0, 207.0, 208.0, 208.0, 209.0, 210.0, 210.0, 211.0, 211.0, 212.0, 212.0, 213.0, 213.0, 213.0, 214.0, 214.0, 215.0, 215.0, 216.0, 216.0, 216.0, 217.0, 218.0, 218.0, 219.0, 220.0, 220.0, 221.0, 222.0, 223.0, 224.0, 225.0, 226.0, 227.0, 228.0, 228.0, 230.0, 235.0], "marker": {"color": "#96f2d7"}, "boxpoints": "outliers"}]}Box plot summarizing the distribution of penguin flipper lengths. The box represents the IQR (Q1 to Q3), the line inside is the median, and whiskers extend to show the data range (excluding outliers, typically 1.5 * IQR). Points beyond whiskers are potential outliers.Interpretation: The box plot clearly shows the median (around 200 mm), the IQR (the box itself, roughly 193 mm to 209 mm based on .describe()), and the overall range indicated by the whiskers. In this specific plot after dropping NaNs, there don't appear to be any points marked as outliers, suggesting no extreme values according to the standard IQR rule (Q3 + 1.5IQR or Q1 - 1.5IQR).Analyzing a Categorical Variable: speciesNow let's investigate the species column.Frequency Counts and ProportionsFor categorical data, we want to know how many observations fall into each category.# Calculate frequency counts for species species_counts = penguins_df['species'].value_counts() print("\nFrequency Counts for Species:") print(species_counts) # Calculate proportions (percentages) species_proportions = penguins_df['species'].value_counts(normalize=True) * 100 print("\nProportions (%) for Species:") print(species_proportions)Interpretation: The .value_counts() method lists each unique species and the number of penguins belonging to it. Setting normalize=True converts these counts into proportions (or percentages when multiplied by 100), showing the relative frequency of each species in the dataset. We see the Adelie species is the most common in this cleaned dataset.Visualizing Frequencies: Bar ChartBar charts are ideal for comparing the frequencies of different categories.# Create a bar chart for species counts plt.figure(figsize=(7, 5)) # Use countplot for direct counting and plotting sns.countplot(data=penguins_df, x='species', palette=['#ff8787', '#74c0fc', '#74b816'], order=species_counts.index) plt.title('Number of Penguins per Species') plt.xlabel('Species') plt.ylabel('Count') plt.show(){"layout": {"title": "Number of Penguins per Species", "xaxis": {"title": "Species", "categoryorder": "array", "categoryarray": ["Adelie", "Gentoo", "Chinstrap"]}, "yaxis": {"title": "Count"}, "template": "plotly_white", "width": 500, "height": 400}, "data": [{"type": "bar", "x": ["Adelie", "Gentoo", "Chinstrap"], "y": [146, 119, 68], "marker": {"color": ["#ff8787", "#74c0fc", "#74b816"]}}]}Bar chart displaying the count of penguins for each species found in the dataset.Interpretation: The bar chart provides an immediate visual comparison of the species counts, reinforcing that the Adelie species has the highest representation, followed by Gentoo, and then Chinstrap in our processed data. The order=species_counts.index argument ensures the bars are plotted in descending order of frequency, which is often helpful for readability. Using a specific palette allows controlling the colors.Summary of PracticeIn this practice session, you applied univariate analysis techniques to both numerical (flipper_length_mm) and categorical (species) variables from the penguins dataset. You calculated essential descriptive statistics and generated histograms, box plots, and bar charts to visualize their distributions and frequencies. This process of examining variables one by one is fundamental to understanding your data's basic characteristics before looking at relationships between variables.