While box plots provide a concise summary of a distribution through its quartiles, they don't show the full shape. Kernel Density Estimate (KDE) plots, on the other hand, excel at visualizing the shape but don't explicitly mark summary statistics like the median or interquartile range (IQR). Violin plots cleverly combine both approaches.
A violin plot displays a KDE mirrored on each side (forming the "violin" shape) and often includes a representation of the summary statistics inside, similar to a box plot. This allows you to see the overall shape, density, modality (number of peaks), and key summary points simultaneously, offering a richer view of the data's distribution compared to using either a box plot or a KDE plot alone. They are particularly effective when comparing distributions across different categories.
seaborn.violinplot
Seaborn makes creating violin plots straightforward with the seaborn.violinplot()
function. Let's start with visualizing the distribution of a single numerical variable. We'll use the familiar 'tips' dataset.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load the example dataset
tips = sns.load_dataset("tips")
# Create a violin plot for the 'total_bill' column
plt.figure(figsize=(6, 4)) # Adjust figure size for better readability
sns.violinplot(y=tips["total_bill"])
plt.title("Distribution of Total Bill Amounts")
plt.ylabel("Total Bill ($)")
plt.show()
A violin plot showing the distribution of total bill amounts. The widest parts indicate where most bills fall, and the shape suggests a slight right skew.
In this plot:
The real strength of violin plots emerges when comparing the distribution of a numerical variable across different groups defined by a categorical variable. We can assign the categorical variable to the x
axis and the numerical variable to the y
axis.
Let's compare the distribution of total_bill
for each day of the week.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load the example dataset
tips = sns.load_dataset("tips")
# Compare total_bill distribution across different days
plt.figure(figsize=(8, 5))
sns.violinplot(x="day", y="total_bill", data=tips, palette="coolwarm") # Using a color palette
plt.title("Distribution of Total Bill by Day")
plt.xlabel("Day of the Week")
plt.ylabel("Total Bill ($)")
plt.show()
Example violin plot comparing total bill distributions across days. Note: This Plotly JSON represents a simplified structure for demonstration; a full conversion would include all data points or calculated statistics for accurate shapes. The Seaborn plot generated by the Python code provides the standard visualization.
This plot makes it easy to visually compare:
Seaborn's violinplot
offers several parameters for customization:
inner
: Controls the representation inside the violin. Options include:
'box'
: (Default) Shows a mini box plot.'quartiles'
: Shows lines for the three quartiles (Q1, median, Q3).'point'
or 'stick'
: Shows individual observations or sticks.None
: Shows only the violin shape.palette
: Applies different colors to violins based on the categorical variable (as used above).hue
: Adds another layer of categorization using color within each primary category on the x-axis.split
: If using hue
with exactly two levels, setting split=True
draws half of a violin for each level of the hue
variable, allowing direct comparison side-by-side within the same violin space.scale
: Determines how the widths of the violins are scaled.
'area'
: (Default) Scales violins to have the same area.'count'
: Scales violin width by the number of observations in that category.'width'
: Scales violins to have the same maximum width.Let's use hue
and split
to compare bills between smokers and non-smokers on each day:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load the example dataset
tips = sns.load_dataset("tips")
# Compare total_bill by day, split by smoker status
plt.figure(figsize=(10, 6))
sns.violinplot(x="day", y="total_bill", hue="smoker", data=tips,
palette="muted", split=True, inner="quartile", scale="count")
plt.title("Distribution of Total Bill by Day and Smoker Status")
plt.xlabel("Day of the Week")
plt.ylabel("Total Bill ($)")
plt.legend(title="Smoker")
plt.show()
Violin plot comparing total bill distribution by day, split by smoker status. The
split=True
argument combines violins for direct comparison,inner='quartile'
shows quartile lines, andscale='count'
adjusts width by observation count.
This "split violin" plot clearly shows, for example, that on weekends (Saturday, Sunday), the distribution shape and range for smokers and non-smokers differ. The inner='quartile'
option replaces the mini box plot with lines indicating the 25th, 50th (median), and 75th percentiles. scale='count'
makes violins wider where there are more observations (e.g., more non-smokers on Thursday).
When looking at a violin plot, pay attention to:
Violin plots are an excellent choice when you need to compare distributions across groups and the shape of the distribution is informative. They provide more detail than box plots but can become visually complex if you have too many categories or hue
levels. They offer a compelling way to present distributional differences clearly.
© 2025 ApX Machine Learning