As we've seen, many time series exhibit trends or seasonality, making their statistical properties like mean and variance change over time. This non-stationarity poses a challenge for standard forecasting models like ARMA, which assume stationarity. Fortunately, a common and effective technique to transform non-stationary data into stationary data is differencing.
Differencing computes the change between consecutive observations in the time series. For a time series yt, the first difference, denoted as Δyt, is calculated as:
Δyt=yt−yt−1
This simple operation can often stabilize the mean of a time series by removing trends. Imagine a series with a linear upward trend. The values yt consistently increase. However, the difference between consecutive values (yt−yt−1) might be roughly constant, hovering around a stable mean (representing the slope of the trend).
Let's consider a time series with a clear trend. Applying first-order differencing helps remove this trend, making the series stationary in its mean.
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from statsmodels.tsa.stattools import adfuller
# Generate sample data with a trend
np.random.seed(42)
time = pd.date_range(start='2022-01-01', periods=100, freq='D')
# Trend component + random noise
trend = np.linspace(0, 20, 100)
noise = np.random.normal(0, 2, 100)
data = pd.Series(trend + noise, index=time, name='Original Data')
# Calculate first difference
differenced_data = data.diff().dropna() # dropna removes the first NaN
# Check stationarity before and after
adf_original = adfuller(data)
adf_differenced = adfuller(differenced_data)
print(f"ADF Test on Original Data: p-value = {adf_original[1]:.3f}")
print(f"ADF Test on Differenced Data: p-value = {adf_differenced[1]:.3f}")
# Create Plot
fig = go.Figure()
fig.add_trace(go.Scatter(x=data.index, y=data, mode='lines', name='Original Series (Trend)', line=dict(color='#4263eb')))
fig.add_trace(go.Scatter(x=differenced_data.index, y=differenced_data, mode='lines', name='First Difference', line=dict(color='#12b886')))
fig.update_layout(
title='Effect of First Differencing on Trend',
xaxis_title='Time',
yaxis_title='Value',
legend_title='Series',
template='plotly_white',
width=700,
height=400
)
# fig.show() # In a real environment
The original series shows a clear upward trend, and the ADF test fails to reject the null hypothesis of non-stationarity (p-value is high). After applying first-order differencing, the resulting series fluctuates around a mean of approximately zero, appearing much more stationary. The ADF test on the differenced series now yields a very small p-value, strongly suggesting stationarity.
Sometimes, a single differencing step isn't sufficient. For example, data with a quadratic trend might require differencing twice to achieve stationarity. The second difference is simply the difference of the first difference:
Δ2yt=Δ(Δyt)=Δ(yt−yt−1)=(yt−yt−1)−(yt−1−yt−2)
In practice, it's uncommon to need more than two orders of differencing (d=1 or d=2). You can implement second-order differencing in Pandas by calling .diff()
twice:
# Second difference
differenced_data_2 = data.diff().diff().dropna()
Always check for stationarity (visually and using tests like ADF) after each differencing step. Over-differencing (differencing more times than necessary) can introduce unwanted correlations and complicate modeling.
If your data exhibits seasonality, simply taking the first difference might not be enough to remove the repeating seasonal pattern. Seasonal differencing involves computing the difference between an observation and the corresponding observation from the previous season (or cycle).
If m is the seasonal period (e.g., m=12 for monthly data, m=4 for quarterly data, m=7 for daily data with a weekly pattern), the seasonal difference is:
Δmyt=yt−yt−m
This operation removes the seasonal component by comparing the value to its counterpart one cycle ago. In Pandas, you can perform seasonal differencing using the periods
argument in the .diff()
method:
# Example: Generate data with yearly seasonality (m=12)
time_monthly = pd.date_range(start='2018-01-01', periods=48, freq='MS')
seasonal_component = np.tile(np.sin(np.linspace(0, 2*np.pi, 12)), 4) * 5
trend_monthly = np.linspace(0, 10, 48)
noise_monthly = np.random.normal(0, 1, 48)
data_monthly = pd.Series(trend_monthly + seasonal_component + noise_monthly, index=time_monthly, name='Monthly Data')
# Seasonal differencing (m=12)
seasonal_diff_data = data_monthly.diff(periods=12).dropna()
# Sometimes both regular and seasonal differencing are needed
# First take seasonal difference, then regular difference
combined_diff_data = data_monthly.diff(periods=12).diff(periods=1).dropna()
# Plotting or ADF tests would follow to confirm stationarity...
Often, data with both trend and seasonality might require both a non-seasonal first difference (Δyt) and a seasonal difference (Δmyt). The typical approach is to apply the seasonal difference first, and then apply the non-seasonal difference to the result.
The number of times you difference the data to achieve stationarity is a significant parameter in time series modeling. This order of differencing is represented by the 'd' parameter in non-seasonal ARIMA(p, d, q) models and the 'D' parameter in seasonal SARIMA(p, d, q)(P, D, Q)m models. These models internally handle the differencing based on these parameters.
When forecasting with models like ARIMA or SARIMA that use differenced data, the final forecasts need to be transformed back to the original scale. This reverse operation is called integration (hence the 'I' in ARIMA), which involves summing up the differences cumulatively. Libraries like statsmodels
handle this integration automatically when generating forecasts from fitted ARIMA/SARIMA models.
In summary, differencing is a fundamental technique for making time series data stationary, particularly for removing trends and seasonality. By applying first-order, seasonal, or occasionally second-order differencing, you prepare the data for models that rely on the stationarity assumption. Remember to always verify stationarity after differencing using visual inspection and statistical tests.
© 2025 ApX Machine Learning