All Courses

Hands-on Practice: Loading and Plotting Data

Let's put the concepts from this chapter into practice. We'll load a sample time series dataset using Pandas, ensure it's correctly formatted with a time index, and create some initial visualizations to understand its basic structure. This exercise reinforces the techniques covered earlier, such as loading data, setting the index, and generating plots to identify patterns like trends or seasonality.

First, ensure you have the necessary libraries imported. We'll primarily use Pandas for data manipulation and Matplotlib (or the Pandas plotting backend, which uses Matplotlib) for visualization.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Configure plots for better readability
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

Generating Sample Data

Instead of loading an external file, let's generate a monthly time series dataset. This helps focus on the manipulation and plotting aspects. We'll create data representing monthly widget sales over several years, incorporating a trend and seasonality.

# Generate a date range for 5 years of monthly data
dates = pd.date_range(start='2019-01-01', periods=60, freq='MS')

# Generate data with trend and seasonality
np.random.seed(42) # for reproducibility
trend = np.linspace(50, 150, 60) # Linear upward trend
seasonality = 15 * np.sin(np.arange(60) * (2 * np.pi / 12)) # Monthly seasonality
noise = np.random.normal(0, 10, 60) # Random noise

# Combine components
sales = trend + seasonality + noise
sales = np.maximum(sales, 10) # Ensure sales are positive

# Create DataFrame
widget_sales = pd.DataFrame({'Sales': sales}, index=dates)

print("Sample Widget Sales Data:")
print(widget_sales.head())
print("\nData Information:")
widget_sales.info()

Executing this code creates a Pandas DataFrame named widget_sales. The pd.date_range function generates monthly timestamps ('MS' stands for Month Start), which we use directly as the index. The .head() method shows the first few rows, confirming the structure: a 'Sales' column and a DatetimeIndex. The .info() method confirms the index type (DatetimeIndex) and the data type of the 'Sales' column (float64).

Visualizing the Time Series

With the data loaded and correctly indexed, the next step is to plot it. A simple line plot is often the best starting point for time series data, as it visually represents the sequence of observations over time.

# Plot the time series data
widget_sales['Sales'].plot()
plt.title('Monthly Widget Sales (2019-2023)')
plt.xlabel('Date')
plt.ylabel('Sales Units')
plt.show()

This plot command uses the built-in plotting capabilities of Pandas DataFrames. Because the DataFrame has a DatetimeIndex, Pandas automatically formats the x-axis appropriately.

Here is a representation of the resulting plot:

Synthetic monthly widget sales data from January 2019 to December 2023.

From this initial plot, we can observe:

An upward trend: Sales generally increase over the five-year period.
A seasonal pattern: There appears to be a recurring pattern within each year, with peaks and troughs occurring at similar times.
Noise/Irregularity: The line isn't perfectly smooth, indicating random fluctuations around the underlying trend and seasonal pattern.

Exploring Further with Rolling Statistics

While formal decomposition techniques will be covered in the next chapter, we can use rolling window calculations (introduced earlier) to visually smooth the data and highlight the trend. Calculating a rolling mean helps dampen the seasonality and noise.

Let's compute and plot a 12-month rolling mean alongside the original data.

# Calculate 12-month rolling mean
widget_sales['Rolling Mean (12M)'] = widget_sales['Sales'].rolling(window=12).mean()

# Plot original data and rolling mean
widget_sales['Sales'].plot(label='Original Sales', legend=True)
widget_sales['Rolling Mean (12M)'].plot(label='Rolling Mean (12M)', legend=True, color='orange')

plt.title('Widget Sales with 12-Month Rolling Mean')
plt.xlabel('Date')
plt.ylabel('Sales Units')
plt.show()

Here's how this combined plot might look:

Original widget sales data plotted with its 12-month rolling mean average.

The orange line representing the rolling mean clearly shows the upward trend, smoothing out the seasonal peaks and troughs visible in the blue line (original data). Note that the rolling mean starts only after the first 11 data points because it requires a full window of 12 observations to compute the first value.

This practice exercise demonstrated loading time series data into Pandas, ensuring the index is correctly set, and performing initial visualizations. These steps are fundamental for any time series analysis project, providing a first look at the data's behavior and informing subsequent analysis steps like decomposition and modeling.

Was this section helpful?