Many time series models, particularly the classical statistical ones we will explore, rely on a fundamental assumption about the data's behavior over time: stationarity. Intuitively, a time series is stationary if its statistical properties do not depend on the time at which the series is observed. Think of it like this: if you take a chunk of the series from the beginning and another chunk of the same length from the end, their basic statistical characteristics (like the average value or how spread out the data is) should look roughly the same.
More formally, we usually work with weak-sense stationarity (or second-order stationarity). A time series {Yt} is weakly stationary if it satisfies three conditions:
- Constant Mean: The expected value (mean) of the series is constant over time.
E[Yt]=μfor all t
- Constant Variance: The variance of the series is constant and finite over time.
Var(Yt)=E[(Yt−μ)2]=σ2<∞for all t
- Constant Autocovariance: The covariance between values at two time points depends only on the distance (lag) between those time points, not on the specific time itself.
Cov(Yt,Yt+h)=E[(Yt−μ)(Yt+h−μ)]=γhfor all t and lag h
In simpler terms:
- The average level of the series doesn't systematically increase or decrease (no trend).
- The fluctuations around the average level have consistent width (constant variance).
- The relationship between an observation and its lagged values is consistent regardless of where you are in the series.
Consider the contrast with non-stationary data. A series with a clear upward trend violates the constant mean condition. A series whose fluctuations become wider over time violates the constant variance condition. Data with strong seasonality often violates both the constant mean and constant autocovariance conditions, as the mean level and the relationship between points depend on the time of year.
The top series fluctuates around a constant mean with constant variance, characteristic of stationary data. The bottom series exhibits a clear upward trend, violating the constant mean condition of stationarity.
Why is Stationarity Important for Modeling?
The assumption of stationarity is significant because it simplifies the modeling process considerably.
- Predictability: If a series is stationary, its statistical properties (mean, variance, correlations) learned from historical data are more likely to be relevant for the future. This makes forecasting more reliable because the underlying data generating process is assumed to be stable over time.
- Model Suitability: Many fundamental time series models, like ARMA (Autoregressive Moving Average), are designed for stationary data. These models attempt to explain the fluctuations around a constant mean based on past values and past errors. Applying them directly to non-stationary data can lead to invalid statistical inferences, poor model fits, and unreliable forecasts. The parameters estimated might not be meaningful.
- Avoiding Spurious Results: When analyzing relationships between non-stationary time series, you can encounter "spurious regressions". This means you might find statistically significant relationships between variables that are actually independent but happen to share similar trends. Working with stationary data helps avoid these misleading results.
Recognizing non-stationarity is the first step towards addressing it. Techniques like decomposition, which we cover next, help identify components like trends and seasonality that cause non-stationarity. Later in this chapter, we'll discuss methods like differencing, which aim to transform non-stationary data into a stationary form suitable for models like ARIMA (Autoregressive Integrated Moving Average), where the "Integrated" part specifically handles non-stationarity due to trends.