Alright, let's put your skills into practice. In the previous sections of this chapter, you learned how to load data using Pandas and the different ways Matplotlib and Seaborn can interact with DataFrames. Now, we'll work through a complete example, starting from loading data from a file and creating meaningful visualizations.
Imagine you have a dataset containing monthly sales figures for different product categories stored in a Comma Separated Values (CSV) file. Our goal is to load this data and visualize the sales trends.
First, ensure you have a CSV file named monthly_sales.csv
in your working directory with the following content:
Month,CategoryA_Sales,CategoryB_Sales
Jan,150,80
Feb,160,95
Mar,175,90
Apr,180,105
May,195,110
Jun,210,100
Now, let's start our Python script by importing the necessary libraries and loading the data into a Pandas DataFrame.
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load the dataset from the CSV file
try:
df_sales = pd.read_csv('monthly_sales.csv')
print("Data loaded successfully:")
print(df_sales.head()) # Display the first few rows
except FileNotFoundError:
print("Error: 'monthly_sales.csv' not found.")
# You might want to stop execution or handle this differently
# For this example, we'll create a placeholder DataFrame if file not found
data = {'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'],
'CategoryA_Sales': [150, 160, 175, 180, 195, 210],
'CategoryB_Sales': [80, 95, 90, 105, 110, 100]}
df_sales = pd.DataFrame(data)
print("\nCreated placeholder data:")
print(df_sales.head())
# Basic check of data types
print("\nData Information:")
df_sales.info()
This code first imports Pandas, Matplotlib's Pyplot module, and Seaborn using their standard aliases. It then attempts to load monthly_sales.csv
. We include a try-except
block to handle the case where the file might be missing, creating a default DataFrame so the rest of the code can still run for demonstration. Finally, df_sales.head()
shows the first few rows, and df_sales.info()
gives us a summary of the columns and their data types. You should see that Month
is an object (string) and the sales columns are integers (int64
).
Let's create a line plot showing the sales trend for both categories using Matplotlib directly. We'll pass the relevant columns from our df_sales
DataFrame to the plt.plot()
function.
# Ensure data is loaded before plotting
if 'df_sales' in locals():
# Create a figure and axes for the plot
fig, ax = plt.subplots(figsize=(8, 5)) # Adjust figure size
# Plot Category A sales
ax.plot(df_sales['Month'], df_sales['CategoryA_Sales'], marker='o', linestyle='-', color='#4263eb', label='Category A')
# Plot Category B sales
ax.plot(df_sales['Month'], df_sales['CategoryB_Sales'], marker='s', linestyle='--', color='#f76707', label='Category B')
# Add title and labels
ax.set_title('Monthly Sales Trend by Category (Matplotlib)')
ax.set_xlabel('Month')
ax.set_ylabel('Sales Units')
# Add a legend
ax.legend()
# Improve layout and display the plot
plt.tight_layout()
plt.show()
else:
print("DataFrame 'df_sales' not available for plotting.")
Here, we explicitly select the Month
column for the x-axis and the respective sales columns (CategoryA_Sales
, CategoryB_Sales
) for the y-axis. We customize the plot with markers (marker
), line styles (linestyle
), colors (color
), and add labels for clarity using label
, which are then displayed using ax.legend()
. We also set the title and axis labels using methods on the Axes
object (ax
). plt.tight_layout()
adjusts spacing, and plt.show()
displays the visualization. This approach gives you fine-grained control over each element by directly using Matplotlib functions.
Pandas DataFrames have a convenient .plot()
method that acts as a wrapper around Matplotlib. Let's create a similar line plot using this method.
# Ensure data is loaded before plotting
if 'df_sales' in locals():
# Use the Pandas .plot() method
ax = df_sales.plot(kind='line', x='Month', y=['CategoryA_Sales', 'CategoryB_Sales'],
marker='o', figsize=(8, 5),
color=['#4263eb', '#f76707']) # Specify colors for lines
# Set title and labels (Pandas sets some defaults, but we can override)
ax.set_title('Monthly Sales Trend by Category (Pandas .plot)')
ax.set_xlabel('Month')
ax.set_ylabel('Sales Units')
ax.legend(title='Category') # Customize legend
# Display the plot
plt.tight_layout()
plt.show()
else:
print("DataFrame 'df_sales' not available for plotting.")
This code is more concise. We call .plot()
directly on the df_sales
DataFrame. We specify kind='line'
, the column for the x-axis (x='Month'
), and a list of columns for the y-axis (y=['CategoryA_Sales', 'CategoryB_Sales']
). Pandas automatically handles plotting multiple lines and adding a basic legend. We can still access the underlying Matplotlib Axes
object (returned by .plot()
) to make further customizations, such as setting the title and labels.
Seaborn excels at creating statistically informative and aesthetically pleasing plots, often with less code, especially when working with "tidy" data. Our current DataFrame format (often called "wide" format) isn't ideal for some Seaborn functions that prefer "long" format (where each observation is a row). Let's first melt the DataFrame to convert it to long format, which is a common data preparation step.
# Ensure data is loaded before melting/plotting
if 'df_sales' in locals():
# Melt the DataFrame from wide to long format
df_sales_long = pd.melt(df_sales,
id_vars=['Month'], # Column(s) to keep as identifier variables
value_vars=['CategoryA_Sales', 'CategoryB_Sales'], # Columns to unpivot
var_name='Category', # Name for the new column holding original column names
value_name='Sales') # Name for the new column holding values
print("\nLong format DataFrame:")
print(df_sales_long.head())
# Now, create a line plot using Seaborn
plt.figure(figsize=(8, 5)) # Control figure size with Matplotlib
sns.lineplot(data=df_sales_long, x='Month', y='Sales', hue='Category',
marker='o', palette=['#4263eb', '#f76707']) # Use hue for categories
# Add title and labels (Seaborn sets some defaults)
plt.title('Monthly Sales Trend by Category (Seaborn)')
plt.xlabel('Month')
plt.ylabel('Sales Units')
# Improve layout and display the plot
plt.tight_layout()
plt.show()
# Example: Create a bar plot comparing total sales per category
# Note: This requires numeric aggregation first or using barplot's estimator
category_totals = df_sales_long.groupby('Category')['Sales'].sum().reset_index()
print("\nTotal Sales per Category:")
print(category_totals)
plt.figure(figsize=(6, 4))
sns.barplot(data=category_totals, x='Category', y='Sales', palette=['#4263eb', '#f76707'])
plt.title('Total Sales Comparison')
plt.xlabel('Product Category')
plt.ylabel('Total Sales Units')
plt.tight_layout()
plt.show()
else:
print("DataFrame 'df_sales' not available for plotting.")
First, we use pd.melt
to transform df_sales
. The id_vars=['Month']
keeps the 'Month' column, while value_vars
specifies the columns whose values we want to consolidate. The var_name='Category'
creates a new column containing the original column names ('CategoryA_Sales', 'CategoryB_Sales'), and value_name='Sales'
creates a column holding the corresponding sales figures.
With the data in long format (df_sales_long
), creating the Seaborn line plot (sns.lineplot
) is straightforward. We pass the entire DataFrame to data
, and specify column names as strings for x
, y
, and importantly, hue
. The hue='Category'
argument tells Seaborn to draw separate lines (with different colors and potentially styles) for each unique value in the 'Category' column. Seaborn automatically handles the legend and applies its default styling. We use palette
to assign specific colors.
We also added a second example using sns.barplot
. Notice that for the bar plot showing total sales, we first needed to calculate these totals using Pandas' groupby()
and sum()
before plotting with Seaborn. Seaborn's barplot
by default shows the mean (and a confidence interval) if multiple values exist per category on the x-axis; since we pre-aggregated, it shows the sums directly.
Here's how the barplot
might look using Plotly syntax for demonstration:
Simple bar chart comparing the total units sold for Category A versus Category B over the entire period.
This hands-on practice demonstrates how you can load data using Pandas and then choose the most suitable plotting tool (Matplotlib, Pandas .plot()
, or Seaborn) based on your needs. You saw how Pandas integrates well with both libraries and how data reshaping (like melting) can facilitate visualization, particularly with Seaborn. You are now equipped to load your own datasets and start creating insightful visualizations.
© 2025 ApX Machine Learning