While generating plots like histograms, scatter plots, or the pair plots discussed earlier provides an initial view of the data, their true value in communication comes from clear annotation. A plot without labels or a title is often ambiguous and difficult to interpret correctly. Effectively using titles, axis labels, and legends transforms a basic chart into a meaningful piece of analysis. Matplotlib and Seaborn offer straightforward ways to add these essential components.
Let's assume we have a Pandas DataFrame df
with columns like 'feature_A', 'feature_B', and 'category'. We might generate a scatter plot:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# Sample data for demonstration
np.random.seed(42)
data = {
'feature_A': np.random.rand(50) * 10,
'feature_B': 0.5 * (np.random.rand(50) * 10) + np.random.randn(50) * 1.5,
'category': np.random.choice(['Group 1', 'Group 2'], size=50)
}
df = pd.DataFrame(data)
# Basic scatter plot
plt.figure(figsize=(8, 5)) # Control figure size
sns.scatterplot(x='feature_A', y='feature_B', data=df)
plt.show()
This initial plot shows the relationship but lacks context. Let's enhance it.
A title summarizes the plot's main message or content. In Matplotlib (and thus often when using Seaborn, as Seaborn plots on Matplotlib axes), you use plt.title()
or the Axes object method ax.set_title()
.
# Create the plot and get the Axes object
plt.figure(figsize=(8, 5))
ax = sns.scatterplot(x='feature_A', y='feature_B', data=df)
# Add a descriptive title
ax.set_title('Relationship between Feature A and Feature B')
plt.show()
Adding ax.set_title()
provides immediate context about what the plot represents. Choose titles that are concise yet informative.
Axis labels are critical for understanding what the x-axis and y-axis represent. Without them, the scale and variables are unknown. Use plt.xlabel()
, plt.ylabel()
or the Axes object methods ax.set_xlabel()
, ax.set_ylabel()
. It's good practice to include units if applicable (e.g., 'Temperature (°C)', 'Revenue (USD)').
# Create the plot and get the Axes object
plt.figure(figsize=(8, 5))
ax = sns.scatterplot(x='feature_A', y='feature_B', data=df)
# Add title and axis labels
ax.set_title('Relationship between Feature A and Feature B')
ax.set_xlabel('Feature A (Units)') # Be specific with units if known
ax.set_ylabel('Feature B (Response)')
plt.show()
Now, anyone looking at the plot knows precisely which variables are being plotted on which axis.
Legends are necessary when your plot includes multiple groups or categories distinguished by color, marker style, or line type. Seaborn often automatically adds a legend when you use parameters like hue
, style
, or size
.
Let's modify our scatter plot to color points by 'category':
# Create the plot with hue for category
plt.figure(figsize=(9, 6)) # Slightly larger figure for legend
ax = sns.scatterplot(x='feature_A', y='feature_B', hue='category', data=df)
# Add title and axis labels
ax.set_title('Feature B vs Feature A, by Category')
ax.set_xlabel('Feature A (Units)')
ax.set_ylabel('Feature B (Response)')
# Customize legend position (optional)
# ax.legend(loc='upper left', title='Category Type') # Example customization
plt.show()
Seaborn automatically added a legend because we used the hue
parameter. The legend maps the colors to the category names ('Group 1', 'Group 2').
Sometimes you might need more control over the legend. Matplotlib's ax.legend()
provides options for placement (loc
), adding a title to the legend (title
), removing the frame (frameon=False
), and more. Common loc
values include 'best', 'upper right', 'upper left', 'lower left', 'lower right', 'center left', 'center right', 'lower center', 'upper center', 'center'. Seaborn might sometimes place the legend awkwardly, so adjusting loc
can improve readability.
Effective visualization involves applying these elements together. Here's the fully customized example:
# Create the plot with hue and customize
plt.figure(figsize=(9, 6))
ax = sns.scatterplot(x='feature_A', y='feature_B', hue='category', data=df, s=60) # Increased point size
# Add title and axis labels
ax.set_title('Feature B vs Feature A, Grouped by Category', fontsize=14, fontweight='bold')
ax.set_xlabel('Feature A (Units)', fontsize=12)
ax.set_ylabel('Feature B (Response)', fontsize=12)
# Improve grid visibility
ax.grid(True, linestyle='--', alpha=0.6)
# Customize legend
ax.legend(title='Customer Group', loc='upper left', frameon=True)
# Improve tick label appearance (optional)
plt.xticks(fontsize=10)
plt.yticks(fontsize=10)
# Adjust layout to prevent labels overlapping
plt.tight_layout()
plt.show()
This final plot is significantly clearer:
Taking the time to customize plots with titles, labels, and legends is a fundamental step in EDA. It ensures that your visual explorations are not only useful for your own understanding but can also be effectively communicated to others, forming a basis for discussion and further analysis, including the feature engineering steps we will discuss next.
© 2025 ApX Machine Learning