Let's put the concepts of decomposition and stationarity testing into practice. We'll use a common time series dataset, analyze its components, test for stationarity, and apply transformations to make it stationary. This hands-on exercise will solidify your understanding of these fundamental preparation steps before building forecasting models.
We assume you have a working Python environment with libraries like pandas
, matplotlib
, and statsmodels
installed.
First, let's load a dataset. A classic example exhibiting trend and seasonality is the 'AirPassengers' dataset, which contains monthly international airline passenger counts. We'll simulate loading such data into a Pandas DataFrame and ensure the index is correctly set as datetime objects.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller
# Load the data (Replace with actual path if using your own file)
# For reproducibility, we'll create a similar synthetic dataset here
date_rng = pd.date_range(start='1949-01-01', end='1960-12-01', freq='MS')
# Generate data with trend and seasonality similar to AirPassengers
base = 100
trend_factor = np.linspace(1, 3, len(date_rng))
seasonal_factor = (np.sin(np.arange(len(date_rng)) * (2 * np.pi / 12)) + 1) * 0.5 + 0.75 # Min 0.75, Max 1.75
noise = np.random.normal(1, 0.05, len(date_rng))
passengers = base * trend_factor * seasonal_factor * noise
data = pd.DataFrame(passengers.astype(int), index=date_rng, columns=['Passengers'])
print("Dataset first 5 rows:")
print(data.head())
print("\nDataset last 5 rows:")
print(data.tail())
# Plot the original data
plt.figure(figsize=(12, 5))
plt.plot(data.index, data['Passengers'])
plt.title('Simulated Monthly Airline Passengers')
plt.xlabel('Date')
plt.ylabel('Passengers')
plt.grid(True, linestyle='--', alpha=0.6)
plt.show()
The initial plot clearly shows an upward trend (more passengers over time) and a repeating annual pattern (seasonality). The variance also appears to increase over time, suggesting a multiplicative relationship between the components.
Let's decompose the series to explicitly visualize the trend, seasonality, and residual components. Since the seasonality and variance seem to grow with the level of the series, a multiplicative decomposition is likely more appropriate.
# Perform multiplicative decomposition
decomposition = seasonal_decompose(data['Passengers'], model='multiplicative', period=12) # period=12 for monthly data
# Plot the decomposed components
fig = decomposition.plot()
fig.set_size_inches(10, 8)
fig.suptitle('Multiplicative Decomposition', y=1.02) # Adjust title position
plt.tight_layout(rect=[0, 0.03, 1, 0.98]) # Adjust layout to prevent overlap
plt.show()
The decomposition plot separates the original series ('observed') into its estimated trend, seasonal pattern, and residual ('resid') components. This confirms the strong upward trend and the consistent yearly seasonality. The residuals appear relatively random around 1.0, although their variance might also increase slightly over time.
Now, we'll formally check if the original series is stationary.
A common visual check involves plotting rolling statistics (mean and standard deviation). If these change significantly over time, the series is likely non-stationary.
# Calculate rolling statistics
rolling_mean = data['Passengers'].rolling(window=12).mean()
rolling_std = data['Passengers'].rolling(window=12).std()
# Plot rolling statistics
plt.figure(figsize=(12, 5))
plt.plot(data['Passengers'], color='#228be6', label='Original')
plt.plot(rolling_mean, color='#f03e3e', label='Rolling Mean (12 months)')
plt.plot(rolling_std, color='#0ca678', label='Rolling Std Dev (12 months)')
plt.legend(loc='best')
plt.title('Rolling Mean & Standard Deviation')
plt.xlabel('Date')
plt.ylabel('Passengers')
plt.grid(True, linestyle='--', alpha=0.6)
plt.show()
The rolling mean clearly follows the upward trend, and the rolling standard deviation increases over time. Both indicate non-stationarity.
Let's use the ADF test for a statistical confirmation. The null hypothesis (H0) for the ADF test is that the time series has a unit root, meaning it is non-stationary. We want to reject H0. A common significance level is 5% (or 0.05).
# Function to perform and print ADF test results
def perform_adf_test(series, series_name=""):
print(f"--- ADF Test Results for {series_name} ---")
# Drop NA values which can occur after differencing
result = adfuller(series.dropna())
print(f'ADF Statistic: {result[0]:.4f}')
print(f'p-value: {result[1]:.4f}')
print('Critical Values:')
for key, value in result[4].items():
print(f'\t{key}: {value:.4f}')
if result[1] <= 0.05:
print("\nResult: Reject the null hypothesis (H0). Data is likely Stationary.")
else:
print("\nResult: Fail to reject the null hypothesis (H0). Data is likely Non-Stationary.")
print("-" * 40)
# Perform ADF test on the original series
perform_adf_test(data['Passengers'], "Original Passengers Series")
You should see output similar to this (exact values depend on the synthetic data generation):
--- ADF Test Results for Original Passengers Series ---
ADF Statistic: 0.8350
p-value: 0.9922
Critical Values:
1%: -3.4817
5%: -2.8840
10%: -2.5788
Result: Fail to reject the null hypothesis (H0). Data is likely Non-Stationary.
----------------------------------------
The p-value (e.g., 0.9922) is much greater than 0.05. Therefore, we fail to reject the null hypothesis and conclude that the original passenger series is non-stationary, confirming our visual inspection.
To make the series stationary, we can apply differencing. Let's start with first-order differencing to remove the trend. This calculates the difference between consecutive observations: yt′=yt−yt−1.
# Apply first-order differencing
data['Passengers_diff1'] = data['Passengers'].diff()
# Plot the first-differenced series
plt.figure(figsize=(12, 5))
plt.plot(data.index, data['Passengers_diff1'])
plt.title('First-Differenced Passenger Series (Trend Removed)')
plt.xlabel('Date')
plt.ylabel('Difference')
plt.grid(True, linestyle='--', alpha=0.6)
plt.show()
# Perform ADF test on the first-differenced series
perform_adf_test(data['Passengers_diff1'], "First-Differenced Series")
The plot shows that the trend is largely removed, but a strong seasonal pattern remains. The ADF test results might look like this:
--- ADF Test Results for First-Differenced Series ---
ADF Statistic: -2.7171
p-value: 0.0711
Critical Values:
1%: -3.4825
5%: -2.8844
10%: -2.5789
Result: Fail to reject the null hypothesis (H0). Data is likely Non-Stationary.
----------------------------------------
The p-value (e.g., 0.0711) is lower but still above 0.05. The series is likely still non-stationary due to the remaining seasonality. We need to address this using seasonal differencing. Seasonal differencing calculates the difference between an observation and the observation from the previous season (e.g., 12 months ago for monthly data): yt′′=yt−yt−m, where m is the seasonal period.
Often, we need to apply both regular and seasonal differencing. Let's apply seasonal differencing to the first-differenced series.
# Apply seasonal differencing (period=12) to the first-differenced series
data['Passengers_diff1_seasonal12'] = data['Passengers_diff1'].diff(12)
# Plot the combined differenced series
plt.figure(figsize=(12, 5))
plt.plot(data.index, data['Passengers_diff1_seasonal12'])
plt.title('First & Seasonally Differenced Passenger Series (d=1, D=1, m=12)')
plt.xlabel('Date')
plt.ylabel('Combined Difference')
plt.grid(True, linestyle='--', alpha=0.6)
plt.show()
# Perform ADF test on the combined differenced series
perform_adf_test(data['Passengers_diff1_seasonal12'], "Combined Differenced Series")
The plot of the combined differenced series should look much more like stationary white noise, fluctuating around zero with a relatively constant variance. The ADF test results should confirm stationarity:
--- ADF Test Results for Combined Differenced Series ---
ADF Statistic: -10.5349 # Example value, will vary
p-value: 0.0000 # Example value, likely very small
Critical Values:
1%: -3.4891
5%: -2.8873
10%: -2.5805
Result: Reject the null hypothesis (H0). Data is likely Stationary.
----------------------------------------
With a very small p-value (e.g., << 0.05), we reject the null hypothesis and conclude that applying both first-order and seasonal differencing (with m=12) has successfully made the series stationary.
In this practice session, we:
seasonal_decompose
to separate the series into its underlying components.Having a stationary series is often a prerequisite for applying models like ARIMA and SARIMA, which we will cover in upcoming chapters. The process of decomposition helps understand the data's structure, while stationarity testing and differencing prepare it for modeling.
© 2025 ApX Machine Learning