Alright, let's put the concepts from this chapter into practice. We'll use Seaborn's functions to explore categorical features within a dataset. These hands-on exercises will help solidify your understanding of how to choose and create appropriate visualizations for categorical data.
For these examples, we'll use the 'tips' dataset, which is conveniently included with Seaborn. It contains information about restaurant tips, including categorical variables like the day of the week, time of day, gender of the person paying, and whether they were a smoker.
First, let's import the necessary libraries and load the dataset. We need Pandas for potential data handling (though Seaborn often handles DataFrames directly), Matplotlib for the underlying plotting engine (and potential customizations), and Seaborn itself.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load the example dataset
tips = sns.load_dataset("tips")
# Display the first few rows to understand the data
print(tips.head())
You should see output similar to this, showing columns like total_bill
, tip
, sex
, smoker
, day
, time
, and size
.
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
Now, let's create some plots.
countplot
Often, the first step in analyzing categorical data is understanding how many observations fall into each category. The countplot
function is perfect for this.
Task: Create a plot showing the number of tips recorded for each day of the week.
# Create the countplot
plt.figure(figsize=(8, 5)) # Optional: Adjust figure size for better readability
sns.countplot(data=tips, x='day', palette=['#74c0fc', '#4dabf7', '#339af0', '#228be6']) # Using colors from blue palette
plt.title('Number of Tips Recorded per Day')
plt.xlabel('Day of the Week')
plt.ylabel('Count')
plt.show()
Interpretation: This plot directly shows the frequency count for each category in the 'day' column. You'll likely observe that more tips were recorded on Saturday and Sunday compared to weekdays, reflecting typical restaurant patronage patterns.
barplot
Bar plots are useful for comparing an average numerical value across different categories. Seaborn's barplot
automatically calculates the mean (by default) and shows confidence intervals.
Task: Visualize the average total_bill
for each day of the week.
# Create the barplot
plt.figure(figsize=(8, 5))
sns.barplot(data=tips, x='day', y='total_bill', palette=['#96f2d7', '#63e6be', '#38d9a9', '#20c997'], errorbar='sd') # Using colors from teal palette, showing standard deviation
plt.title('Average Total Bill per Day')
plt.xlabel('Day of the Week')
plt.ylabel('Average Total Bill ($)')
plt.show()
Average total bill amount calculated for each day, with error bars representing the confidence interval around the mean (or standard deviation if specified).
Interpretation: This plot displays the mean total_bill
for each day. The vertical lines (error bars) indicate the uncertainty around the mean (typically a 95% confidence interval). This helps visualize if the differences in average bill amounts between days are statistically meaningful.
boxplot
To understand the spread and central tendency of a numerical variable for different categories, box plots are excellent.
Task: Compare the distribution of tip
amounts between smokers and non-smokers.
# Create the boxplot
plt.figure(figsize=(7, 5))
sns.boxplot(data=tips, x='smoker', y='tip', palette=['#ffc9c9', '#74c0fc']) # Using red and blue palette colors
plt.title('Distribution of Tip Amounts by Smoking Status')
plt.xlabel('Smoker')
plt.ylabel('Tip Amount ($)')
plt.show()
Interpretation: Each box shows the median (middle line), the interquartile range (IQR, the box itself), and potential outliers (points beyond the whiskers). By comparing the boxes for 'Yes' and 'No' smokers, you can assess differences in typical tip amounts, the spread of tips, and the presence of unusually high or low tips within each group.
swarmplot
Sometimes, seeing every data point is informative, especially when comparing distributions across categories with a moderate number of observations. swarmplot
arranges points so they don't overlap, giving a sense of density.
Task: Visualize individual tip
amounts based on the time
of the meal (Lunch or Dinner).
# Create the swarmplot
plt.figure(figsize=(7, 5))
sns.swarmplot(data=tips, x='time', y='tip', palette=['#ffe066', '#fd7e14']) # Using yellow and orange palette colors
plt.title('Individual Tip Amounts by Time of Day')
plt.xlabel('Time')
plt.ylabel('Tip Amount ($)')
plt.show()
Interpretation: This plot displays each individual tip as a distinct point, positioned according to its value and category ('Lunch' or 'Dinner'). The horizontal arrangement within each category helps visualize the density of tips at different values. It complements the boxplot
by showing the raw data points that contribute to the summary statistics.
pointplot
Point plots are effective for comparing point estimates (like the mean) and their confidence intervals across different categories, particularly when looking for trends or interactions with a second categorical variable (using hue
).
Task: Show the average tip
amount according to the size
of the party (number of people), separated by the sex
of the payer.
# Create the pointplot
plt.figure(figsize=(9, 6))
sns.pointplot(data=tips, x='size', y='tip', hue='sex', palette={'Male': '#1c7ed6', 'Female': '#f06595'}, markers=['o', 's'], linestyles=['-', '--']) # Using blue and pink palette colors
plt.title('Average Tip Amount by Party Size and Payer Gender')
plt.xlabel('Party Size')
plt.ylabel('Average Tip Amount ($)')
plt.legend(title='Payer Gender')
plt.show()
Interpretation: This plot connects the average tip amount (points) for each party size with lines, separately for male and female payers. The vertical lines represent confidence intervals. This visualization makes it easy to compare how average tips change with party size and whether this trend differs between genders. For example, you might observe if tips increase more steeply with party size for one gender compared to the other.
These exercises demonstrated how to use several key Seaborn functions (countplot
, barplot
, boxplot
, swarmplot
, pointplot
) to effectively visualize categorical data. You learned to display frequencies, compare average values, examine distributions, show individual data points, and analyze trends across categories. Experiment further by swapping variables, trying different plot types (like violinplot
or stripplot
), and exploring the various customization options available in Seaborn to gain deeper insights from your own categorical data.
© 2025 ApX Machine Learning