While bar plots and box plots give us useful summaries (like averages or quartiles) for each category, they don't show the actual data points. Sometimes, seeing every single observation is important to understand the distribution, density, and potential outliers within each category. Seaborn provides two excellent functions for this purpose: stripplot
and swarmplot
. Both create scatter plots where one axis represents a categorical variable.
The stripplot
function is the most straightforward way to visualize individual data points grouped by category. It essentially draws a scatter plot where the position on one axis (usually the x-axis) corresponds to the category, and the position on the other axis (usually the y-axis) corresponds to the numerical value.
Let's imagine we have data on restaurant tips and want to see the distribution of total bill amounts for each day of the week.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load the example tips dataset
tips = sns.load_dataset("tips")
# Create a basic strip plot
plt.figure(figsize=(8, 5)) # Set figure size for better readability
sns.stripplot(x="day", y="total_bill", data=tips)
plt.title("Total Bill Amount per Day (Strip Plot)")
plt.xlabel("Day of the Week")
plt.ylabel("Total Bill ($)")
plt.show()
A simplified example showing individual bill amounts plotted for different days using a strip plot approach.
You'll notice in the output of the code example (if you run it with the full dataset) that many points fall directly on top of each other, especially where bill amounts are common for a given day. This overlap can make it hard to see how many points are really at a specific value or gauge the density of the data.
To handle the overlap, stripplot
has a useful parameter: jitter
. Setting jitter=True
(or specifying a jitter amount) adds a small amount of random noise to the categorical axis positions. This spreads the points out horizontally within their category column, making it easier to see individual markers.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load the example tips dataset
tips = sns.load_dataset("tips")
# Create a strip plot with jitter
plt.figure(figsize=(8, 5))
sns.stripplot(x="day", y="total_bill", data=tips, jitter=True, palette="Blues") # Added jitter and a palette
plt.title("Total Bill Amount per Day (Strip Plot with Jitter)")
plt.xlabel("Day of the Week")
plt.ylabel("Total Bill ($)")
plt.show()
Applying jitter horizontally spreads out points within each category, reducing overlap seen in the basic strip plot.
Jitter significantly improves visibility for moderately sized datasets. You can also map other variables to aesthetics like hue
(color) to compare subgroups within each category.
While jitter helps, the random noise means the exact horizontal position isn't meaningful. Seaborn offers swarmplot
as an alternative that positions points along the categorical axis intelligently to prevent any overlap. It arranges the points like bees in a swarm, showing the distribution density more clearly.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load the example tips dataset
tips = sns.load_dataset("tips")
# Create a swarm plot
plt.figure(figsize=(8, 5))
sns.swarmplot(x="day", y="total_bill", data=tips, palette="viridis", size=4) # Use palette and adjust size
plt.title("Total Bill Amount per Day (Swarm Plot)")
plt.xlabel("Day of the Week")
plt.ylabel("Total Bill ($)")
plt.show()
Swarm plot arranges points to avoid overlap, giving a clearer view of the distribution density within each day compared to jittered strip plots. Note: This sample uses fewer points for clarity.
The main advantage of swarmplot
is the clear representation of the distribution shape and density at different values. You can easily see where data points are concentrated.
However, the algorithm to place points without overlap can be computationally intensive. For very large datasets (many thousands of points), generating a swarm plot can become slow, and the plot itself might become too dense to interpret effectively.
stripplot
and swarmplot
stripplot
(with jitter=True
) when:
swarmplot
is too slow or creates plots that are too dense.swarmplot
when:
Both stripplot
and swarmplot
accept similar arguments for customization, including hue
to add another categorical dimension using color, palette
to control colors, and size
to adjust marker size. They are powerful tools for looking beyond summary statistics and examining the raw data points within your categories. Often, they are effectively combined with plots like boxplot
or violinplot
(by plotting them on the same Axes) to provide both summary statistics and individual point distributions simultaneously.
© 2025 ApX Machine Learning