After computing frequency counts for categorical variables, the next logical step is to visualize these distributions. While tables of numbers are precise, graphical representations often provide a more immediate understanding of the relative frequencies and patterns within the data. The most common and effective visualization for displaying the frequency or proportion of categories in a single categorical variable is the bar chart.
A bar chart uses rectangular bars whose lengths are proportional to the values they represent. For univariate categorical analysis, the bars typically show the count (frequency) or proportion of observations falling into each category. This makes it straightforward to compare categories at a glance.
Python libraries like Matplotlib and Seaborn provide convenient functions for generating bar charts directly from Pandas Series or DataFrames. Seaborn, built on top of Matplotlib, offers functions specifically designed for statistical visualization, often requiring less code for common plots.
A common way to create a bar chart for category counts is using Seaborn's countplot
function. It automatically calculates the frequency of each category in the specified column and plots it.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Assume 'df' is your DataFrame and 'product_category' is the column of interest
# Example DataFrame creation:
data = {'product_category': ['Electronics', 'Clothing', 'Groceries', 'Electronics', 'Clothing', 'Electronics', 'Home Goods', 'Clothing', 'Groceries', 'Electronics']}
df = pd.DataFrame(data)
plt.figure(figsize=(8, 5)) # Optional: Adjust figure size
sns.countplot(data=df, x='product_category', palette=['#4dabf7', '#69db7c', '#ff922b', '#be4bdb']) # Using colors from palette
plt.title('Frequency of Product Categories')
plt.xlabel('Product Category')
plt.ylabel('Count')
plt.xticks(rotation=45) # Rotate labels if they overlap
plt.tight_layout() # Adjust layout
plt.show()
Alternatively, you can calculate the value counts using Pandas first and then use Matplotlib's or Pandas' plotting functions:
import matplotlib.pyplot as plt
import pandas as pd
# Assume 'df' is your DataFrame and 'product_category' is the column
# Example DataFrame creation (same as above):
data = {'product_category': ['Electronics', 'Clothing', 'Groceries', 'Electronics', 'Clothing', 'Electronics', 'Home Goods', 'Clothing', 'Groceries', 'Electronics']}
df = pd.DataFrame(data)
category_counts = df['product_category'].value_counts()
plt.figure(figsize=(8, 5))
category_counts.plot(kind='bar', color=['#4dabf7', '#69db7c', '#ff922b', '#be4bdb'])
plt.title('Frequency of Product Categories')
plt.xlabel('Product Category')
plt.ylabel('Count')
plt.xticks(rotation=45, ha='right') # Rotate and align labels
plt.tight_layout()
plt.show()
Both methods achieve a similar result. Seaborn's countplot
might be slightly more direct for simple frequency plots, while the Pandas approach gives you the counts explicitly before plotting, which can be useful.
When examining a bar chart for a categorical variable, consider these points:
kind='barh'
in Pandas plot or y=
instead of x=
in Seaborn) for better label readability.Below is an example visualization showing the distribution of fictional customer satisfaction ratings using Plotly.
Distribution of customer satisfaction responses, showing 'Satisfied' as the most common rating.
Bar charts are a fundamental tool in your EDA toolkit for understanding the composition of categorical data. They transform frequency tables into an easily digestible visual format, highlighting the prevalence and distribution of different groups within your dataset.
© 2025 ApX Machine Learning