In data analysis, not all data comes in the form of numbers that you can directly measure or count. Often, you'll encounter data that represents qualities, types, or groups. This type of data is known as categorical data. Think of it as assigning labels or placing observations into distinct bins.
For instance, if you are analyzing customer data, features like 'Gender' (Male, Female, Other), 'City' (New York, London, Tokyo), or 'Subscription Type' (Basic, Premium, Enterprise) are all categorical. Similarly, in scientific experiments, 'Treatment Group' (Control, Treatment A, Treatment B) is categorical. Even simple 'Yes'/'No' responses in a survey fall under this umbrella.
It's helpful to distinguish categorical data from numerical data. Numerical data represents quantities and can be measured on a scale. Examples include 'Age' (e.g., 35 years), 'Temperature' (e.g., 25.5 °C), or 'Revenue' (e.g., $15,750). You can perform meaningful arithmetic operations like calculating averages or sums on numerical data. Trying to average 'City' names, however, doesn't make sense.
A basic distinction between numerical data (measurable quantities) and categorical data (labels or groups).
Within categorical data, sometimes the categories have a natural order, and sometimes they don't.
Recognizing this difference can sometimes guide your analysis or choice of visualization, but the primary focus here is on techniques that work well for any kind of categorical grouping.
Why dedicate specific visualization techniques to categorical data? Because understanding the characteristics of different groups is fundamental to many data analysis tasks:
Because categorical data represents groups rather than continuous values, standard plots like basic line charts are often unsuitable. Instead, we need visualizations designed to show counts, compare statistical summaries across groups, or display the distribution of data points within each category.
This chapter introduces Seaborn functions specifically built for these purposes. You'll learn how functions like countplot
, barplot
, boxplot
, and others provide clear and informative ways to visualize your categorical data, making it easier to extract insights about the different groups within your dataset.
© 2025 ApX Machine Learning