Time series data represents observations collected sequentially over time. Think of daily stock prices, monthly rainfall measurements, or hourly website traffic. Unlike datasets where each observation is typically independent, the defining feature of time series is its inherent temporal dependence.
The value of a series at one point in time, let's call it yt, is often influenced by its value at previous points, such as yt−1, yt−2, and so on. This relationship between an observation and its predecessors is known as autocorrelation or serial correlation.
Consider monthly sales data for a retail store. High sales in December are likely followed by lower sales in January, creating a negative correlation between yDecember and yJanuary. Conversely, strong sales growth in one quarter might suggest continued, albeit potentially slower, growth in the next, indicating a positive correlation over different time lags.
This temporal dependence is fundamental. While it violates the independence assumption underlying many standard statistical methods (like ordinary least squares regression applied directly), it's precisely this structure that time series analysis aims to model and exploit for forecasting. If you know today's temperature, you have a much better idea about tomorrow's temperature than if you knew nothing about today.
Because observations are tied to specific points in time, the order matters significantly. Randomly shuffling a time series dataset would destroy the sequential relationships and render most time series analysis techniques meaningless. The sequence y1,y2,...,yT contains information that is lost if the order is changed. This contrasts sharply with cross-sectional data (like a survey of customer preferences taken at one point in time), where the order of rows usually carries no information.
Time series frequently exhibit systematic patterns that can be identified and modeled. While we will examine these components in detail later, the main types are:
The presence of these patterns necessitates specialized techniques to isolate and understand their influence.
A hypothetical time series displaying an upward trend and seasonal variations over 24 months.
Time series data is associated with a specific frequency, which defines the interval between consecutive observations. This could be hourly, daily, weekly, monthly, quarterly, annually, or even finer (e.g., minutely, secondly) or irregular intervals. Knowing the frequency is important for:
Most techniques assume the data is collected at regular intervals (a fixed frequency), though methods exist for handling irregularly spaced time series.
A significant characteristic of many raw time series is non-stationarity. This means the statistical properties of the series, such as its mean, variance, or autocorrelation structure, change over time. A series exhibiting a clear trend or strong seasonality is typically non-stationary because the mean varies with the trend or season.
Many time series models, including the ARIMA models we will study later, assume the data is stationary. Therefore, identifying non-stationarity and transforming the data to achieve stationarity (often through differencing or decomposition) is a common and necessary step in the analysis workflow. We will dedicate Chapter 2 to understanding and addressing stationarity.
Recognizing these characteristics, temporal dependence, fixed ordering, potential patterns, frequency, and the possibility of non-stationarity, is the starting point for any time series analysis. They inform how we preprocess the data, visualize it, select appropriate models, and ultimately generate meaningful forecasts.
© 2025 ApX Machine Learning