While understanding the distribution of a single variable using histograms or KDE plots is informative, we often need to understand how two variables interact. Are they related? Does the distribution of one variable change depending on the value of another? Visualizing the joint distribution of two variables helps answer these questions.
Seaborn provides a convenient function, jointplot()
, specifically designed for this purpose. It creates a multi-panel figure that shows both the relationship between two variables (the joint distribution) and the distribution of each variable individually along the margins (the marginal distributions).
Think of it as combining a scatter plot (to see the relationship) with histograms (to see individual distributions) all in one figure.
Let's use jointplot()
to explore the relationship between the total bill amount and the tip amount from Seaborn's built-in tips
dataset. First, we need to load the data and import the necessary libraries:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load the example tips dataset
tips = sns.load_dataset("tips")
# Display the first few rows to see the data
print(tips.head())
Now, we can create a joint plot using seaborn.jointplot()
:
# Create a joint plot of total_bill and tip
sns.jointplot(data=tips, x="total_bill", y="tip")
plt.suptitle("Joint Distribution of Total Bill and Tip Amount", y=1.02) # Add a title slightly above the plot
plt.tight_layout() # Adjust layout
plt.show()
When you run this code, you'll see a figure with three parts:
total_bill
and the y-axis is tip
. This helps visualize the relationship or correlation between these two variables. By default, jointplot
uses a scatter plot here.total_bill
).tip
).This combined view allows you to simultaneously assess the relationship between total_bill
and tip
and understand the shape, center, and spread of each variable independently.
The default jointplot
uses a scatter plot for the central view, but you can change this using the kind
parameter. This allows you to represent the joint distribution in different ways, which can be more suitable depending on your data and what you want to emphasize.
Common options for kind
include:
kind="scatter"
(default): Shows individual points. Good for smaller datasets where points don't overlap too much.kind="kde"
: Uses Kernel Density Estimates for both the joint (central) and marginal plots. This shows a smoothed representation of the distribution, useful for visualizing the density of points.kind="hist"
: Uses a 2D histogram (heatmap-like) for the central plot and standard histograms for the margins. Good for larger datasets where points overlap significantly.kind="hex"
: Similar to kind="hist"
, but uses hexagonal bins instead of square ones for the central plot. It's another way to handle overlapping points in large datasets.kind="reg"
: Adds a regression line to the scatter plot in the central view and shows KDE plots with histograms on the margins. Useful for highlighting a linear relationship.kind="resid"
: Plots the residuals of a linear regression in the central plot. This is more specialized for checking the assumptions of a linear model.Let's try creating a joint plot with kernel density estimates:
# Create a joint plot using KDE
sns.jointplot(data=tips, x="total_bill", y="tip", kind="kde")
plt.suptitle("Joint KDE Plot of Total Bill and Tip", y=1.02)
plt.tight_layout()
plt.show()
This plot uses smoothed curves instead of bars or points. The central plot shows contours where darker regions indicate higher concentrations of data points. The marginal plots are smoothed 1D KDE curves.
Here's an example using hexagonal binning, which is great for larger datasets where scatter plots become overcrowded:
# Create a joint plot using hexagonal binning
# Using a slightly larger dataset for better effect (simulated)
import numpy as np
np.random.seed(0) # for reproducibility
data_large = pd.DataFrame({
'x_var': np.random.randn(1000),
'y_var': np.random.randn(1000) * 0.5 + np.random.randn(1000) * 0.2
})
sns.jointplot(data=data_large, x="x_var", y="y_var", kind="hex", color="#4263eb") # Using an indigo color
plt.suptitle("Joint Hexbin Plot for Larger Data", y=1.02)
plt.tight_layout()
plt.show()
In this hexbin plot, the central area uses hexagons, and the color intensity of each hexagon indicates the number of data points falling within it. This avoids the problem of overlapping points seen in scatter plots with many data points.
You can also customize the appearance further using parameters like color
, height
(to control the size), and joint_kws
, marginal_kws
for passing specific arguments to the underlying plot functions.
jointplot()
is a powerful tool because it packages three plots into one coherent figure, providing a concise summary of the bivariate and univariate distributions. It's particularly effective during exploratory data analysis to quickly understand relationships and distributions between pairs of numerical variables.
© 2025 ApX Machine Learning