While barplot
helps visualize an aggregate statistic (like the average value) for different categories, sometimes you simply need to know how many times each category appears in your dataset. This is a common task when exploring categorical data. For example, you might want to count the number of customers from different regions, the frequency of different sensor readings, or, as in the datasets often used for practice, the number of dining parties on different days of the week.
Seaborn provides a convenient function specifically for this purpose: countplot
. It operates directly on your data to count the occurrences within each category and then displays these counts as bars, similar in appearance to a bar chart but representing frequencies instead of aggregate values.
seaborn.countplot
Let's see how to use countplot
. We'll assume you have a Pandas DataFrame containing your categorical data. Common practice is to import Seaborn as sns
and Matplotlib's pyplot module as plt
. We also often load example datasets directly from Seaborn.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load an example dataset from Seaborn
tips = sns.load_dataset("tips")
# Display the first few rows to understand the data
print(tips.head())
The tips
dataset contains information about restaurant tips, including categorical columns like 'day', 'sex', 'smoker', and 'time'.
To create a count plot showing the number of entries for each day of the week, you pass the DataFrame to the data
parameter and specify the column name ('day' in this case) to the x
parameter:
# Create a count plot for the 'day' column
plt.figure(figsize=(8, 5)) # Optional: Adjust figure size
sns.countplot(x='day', data=tips, palette=['#74c0fc', '#4dabf7', '#339af0', '#228be6'])
plt.title('Number of Tips Recorded per Day')
plt.xlabel('Day of the Week')
plt.ylabel('Count')
plt.show()
This code generates a plot where each bar's height corresponds to the number of rows (records) in the tips
DataFrame for that specific day.
Frequency of records for each day in the
tips
dataset. Saturday has the most records, while Friday has the fewest.
Just like barplot
, you can create horizontal count plots by assigning the categorical column name to the y
parameter instead of x
:
# Create a horizontal count plot for the 'smoker' column
plt.figure(figsize=(7, 4))
sns.countplot(y='smoker', data=tips, palette=['#ffc9c9', '#ff8787'])
plt.title('Count of Smokers vs Non-Smokers')
plt.xlabel('Count')
plt.ylabel('Smoker')
plt.show()
hue
countplot
also supports the hue
parameter, allowing you to compare counts across a second categorical variable. For instance, we can see the distribution of smokers and non-smokers within each day:
# Create a count plot for 'day' with 'smoker' as hue
plt.figure(figsize=(9, 6))
sns.countplot(x='day', hue='smoker', data=tips, palette=['#a5d8ff', '#ffc9c9']) # Blue for No, Red for Yes
plt.title('Count per Day, Separated by Smoker Status')
plt.xlabel('Day of the Week')
plt.ylabel('Count')
plt.show()
This plot now shows pairs of bars for each day. One bar represents the count of non-smokers ('No'), and the other represents the count of smokers ('Yes') for that day. Seaborn automatically adds a legend to clarify which color corresponds to which category within the hue
variable.
countplot
x
, y
: Specify the name of the column in your DataFrame to plot along the horizontal (x
) or vertical (y
) axis. You typically use only one of these to define the primary categorical variable.data
: The Pandas DataFrame containing the data.hue
: The name of a second categorical column in your DataFrame. This adds subgroup comparisons within each primary category defined by x
or y
.order
, hue_order
: Lists of strings to specify the exact order in which categories should appear on the axis (order
) or in the legend (hue_order
).palette
: Allows you to control the colors used for the bars. You can use Seaborn palette names (like 'viridis', 'magma') or provide a list of color codes.countplot
is a straightforward yet effective tool for getting a quick understanding of the frequency distribution within your categorical variables. It's often one of the first plots generated when exploring a new dataset containing categories.
© 2025 ApX Machine Learning