Okay, here is the content for the "Implementing Decomposition in Python" section, following the specified guidelines:
We've discussed the concepts of time series decomposition and the distinction between additive and multiplicative models. Now, let's put this into practice using Python. The statsmodels
library provides robust tools for statistical analysis, including time series decomposition.
The main function we'll employ is statsmodels.tsa.seasonal.seasonal_decompose
. This function implements a classical decomposition method, typically based on moving averages, to separate a time series into its trend, seasonal, and residual components.
seasonal_decompose
The function seasonal_decompose
requires the time series itself and information about the model type and seasonal period.
import pandas as pd
from statsmodels.tsa.seasonal import seasonal_decompose
import matplotlib.pyplot as plt # Using matplotlib for basic plotting example
# Assume 'series' is your pandas Series with a DatetimeIndex.
# It should contain the time series values you want to decompose.
# Example:
# series = pd.read_csv('your_data.csv',
# index_col='Date',
# parse_dates=True)['Value']
# --- Perform Decomposition ---
# model: Specify 'additive' or 'multiplicative'.
# period: The number of observations per seasonal cycle
# (e.g., 12 for monthly data with yearly seasonality,
# 7 for daily data with weekly seasonality).
# Example for additive model with monthly data (period=12)
decomposition_result = seasonal_decompose(series, model='additive', period=12)
# The function returns a DecomposeResult object. You can access
# the individual components as attributes:
trend_component = decomposition_result.trend
seasonal_component = decomposition_result.seasonal
residual_component = decomposition_result.resid
observed_data = decomposition_result.observed # Original data
# --- Visualize the Components ---
# A common way to visualize is stacking the plots:
fig, axes = plt.subplots(4, 1, figsize=(10, 8), sharex=True)
observed_data.plot(ax=axes[0], legend=False, color='#1c7ed6')
axes[0].set_ylabel('Observed')
trend_component.plot(ax=axes[1], legend=False, color='#f76707')
axes[1].set_ylabel('Trend')
seasonal_component.plot(ax=axes[2], legend=False, color='#37b24d')
axes[2].set_ylabel('Seasonal')
residual_component.plot(ax=axes[3], legend=False, color='#adb5bd')
axes[3].set_ylabel('Residual')
plt.xlabel('Date')
plt.suptitle('Time Series Decomposition', y=0.92) # Add a title slightly above plots
plt.tight_layout(rect=[0, 0, 1, 0.9]) # Adjust layout to prevent title overlap
plt.show()
Choosing the Model Type:
model='additive'
when the seasonal variations appear relatively constant over time, regardless of the series' level. The magnitude of the peaks and troughs in the seasonal pattern remains roughly the same. This corresponds to the yt=Tt+St+Rt model.model='multiplicative'
when the seasonal variations seem to scale with the level of the series. As the trend increases, the amplitude of the seasonal pattern also increases. This fits the yt=Tt×St×Rt model. If you suspect multiplicative effects, sometimes analyzing the logarithm of the series can make the pattern additive (log(yt)≈log(Tt)+log(St)+log(Rt)), which can simplify modeling.Specifying the period
:
This parameter is significant for correctly identifying the seasonal pattern. It represents the number of time steps in a full seasonal cycle. Common values include:
period=12
for monthly data with an annual cycle.period=4
for quarterly data with an annual cycle.period=7
for daily data with a weekly cycle.period=52
for weekly data with an annual cycle (approximately).Choosing the correct period
based on your data's known frequency is essential for meaningful decomposition.
Let's apply this to the classic 'AirPassengers' dataset, known for its strong upward trend and increasing annual seasonality. This increasing seasonality suggests a multiplicative model might be more appropriate.
We'll perform both decompositions for comparison. (Assume air_passengers
is a pandas Series containing the monthly passenger counts, indexed by date).
# Assume 'air_passengers' Series is loaded and prepared with a DatetimeIndex
# Example synthetic data generation for demonstration:
date_rng = pd.date_range(start='1949-01-01', end='1960-12-01', freq='MS')
passengers = (
110 + # Base level slightly higher
(date_rng.year - 1949) * 22 + # Slightly steeper trend
(1 + (date_rng.year - 1949) * 0.12) * ( # Slightly stronger seasonality increase
35 * (date_rng.month == 7) + 25 * (date_rng.month == 8) +
-25 * (date_rng.month == 11) + -20 * (date_rng.month == 2)
) +
pd.Series(np.random.normal(0, 12, size=len(date_rng)), index=date_rng) # Slightly more noise
).astype(int)
air_passengers = pd.Series(passengers, index=date_rng)
# Perform Multiplicative Decomposition (period=12 for monthly annual cycle)
result_mul = seasonal_decompose(air_passengers, model='multiplicative', period=12)
# Perform Additive Decomposition for comparison
result_add = seasonal_decompose(air_passengers, model='additive', period=12)
# -- Plotting with Plotly for better web display --
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import numpy as np # Needed for np.nan comparison
fig = make_subplots(
rows=4, cols=2,
shared_xaxes=True,
subplot_titles=('Multiplicative: Observed', 'Additive: Observed',
'Multiplicative: Trend', 'Additive: Trend',
'Multiplicative: Seasonal', 'Additive: Seasonal',
'Multiplicative: Residual', 'Additive: Residual'),
vertical_spacing=0.06 # Slightly more spacing
)
# Function to add traces, handling potential NaNs at ends for trend/resid
def add_trace_safe(fig, data, name, color, row, col, showlegend=True):
# Only plot where data is not NaN
valid_index = data.dropna().index
valid_data = data.dropna().values
fig.add_trace(go.Scatter(x=valid_index, y=valid_data, mode='lines',
name=name, line=dict(color=color), showlegend=showlegend),
row=row, col=col)
# Add traces column by column
add_trace_safe(fig, result_mul.observed, 'Observed', '#1c7ed6', 1, 1)
add_trace_safe(fig, result_mul.trend, 'Trend', '#f76707', 2, 1)
add_trace_safe(fig, result_mul.seasonal, 'Seasonal', '#37b24d', 3, 1)
add_trace_safe(fig, result_mul.resid, 'Residual', '#adb5bd', 4, 1)
add_trace_safe(fig, result_add.observed, 'Observed', '#1c7ed6', 1, 2, showlegend=False)
add_trace_safe(fig, result_add.trend, 'Trend', '#f76707', 2, 2, showlegend=False)
add_trace_safe(fig, result_add.seasonal, 'Seasonal', '#37b24d', 3, 2, showlegend=False)
add_trace_safe(fig, result_add.resid, 'Residual', '#adb5bd', 4, 2, showlegend=False)
fig.update_layout(
height=750, # Slightly taller
title_text='Time Series Decomposition: Multiplicative vs. Additive (Air Passengers)',
margin=dict(l=60, r=30, t=90, b=60),
hovermode='x unified',
legend_title_text='Components',
plot_bgcolor='#e9ecef', # Light gray background
paper_bgcolor='#ffffff' # White paper background
)
fig.update_xaxes(title_text="Date", row=4, col=1, showgrid=False)
fig.update_xaxes(title_text="Date", row=4, col=2, showgrid=False)
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='#dee2e6') # Lighter grid lines
# To generate the JSON output for embedding:
# print(fig.to_json())
Comparison of multiplicative (left column) and additive (right column) decomposition applied to the air passenger dataset.
NaN
values (missing segments) at the beginning and end of the trend line. This happens because the moving average calculation used internally requires data points both before and after the current point.seasonal_decompose
While seasonal_decompose
is a useful exploratory tool, keep these points in mind:
NaN
) or less reliable due to the moving average requiring a full window of data.Decomposition provides valuable insights into the underlying structure of your time series. Examining the components, particularly the residuals, helps assess whether the original series exhibits clear patterns and whether removing them results in something closer to random noise. This assessment is a direct lead-in to the concept of stationarity, which we will investigate more formally using statistical tests in the following sections.
© 2025 ApX Machine Learning