All Courses

Characteristics of Time Series Data

Time series data represents observations collected sequentially over time. Think of daily stock prices, monthly rainfall measurements, or hourly website traffic. Unlike datasets where each observation is typically independent, the defining feature of time series is its inherent temporal dependence.

Temporal Dependence

The value of a series at one point in time, let's call it $y_t$ , is often influenced by its value at previous points, such as $y_{t-1}$ , $y_{t-2}$ , and so on. This relationship between an observation and its predecessors is known as autocorrelation or serial correlation.

Consider monthly sales data for a retail store. High sales in December are likely followed by lower sales in January, creating a negative correlation between $y_{December}$ and $y_{January}$ . Conversely, strong sales growth in one quarter might suggest continued, albeit potentially slower, growth in the next, indicating a positive correlation over different time lags.

This temporal dependence is fundamental. While it violates the independence assumption underlying many standard statistical methods (like ordinary least squares regression applied directly), it's precisely this structure that time series analysis aims to model and exploit for forecasting. If you know today's temperature, you have a much better idea about tomorrow's temperature than if you knew nothing about today.

Implicit Ordering

Because observations are tied to specific points in time, the order matters significantly. Randomly shuffling a time series dataset would destroy the sequential relationships and render most time series analysis techniques meaningless. The sequence $y_1, y_2, ..., y_T$ contains information that is lost if the order is changed. This contrasts sharply with cross-sectional data (like a survey of customer preferences taken at one point in time), where the order of rows usually carries no information.

Potential for Systematic Patterns

Time series frequently exhibit systematic patterns that can be identified and modeled. While we will examine these components in detail later, the main types are:

Trend: A long-term increase or decrease in the data. For example, the gradual increase in global temperatures over decades.
Seasonality: Patterns that repeat over a fixed and known period. Examples include increased retail sales during holidays (yearly seasonality) or higher electricity consumption during specific hours of the day (daily seasonality).
Cyclical Patterns: Longer-term fluctuations that are not of a fixed period, often related to broader economic or business cycles. These are typically harder to model than seasonality due to their variable duration.

The presence of these patterns necessitates specialized techniques to isolate and understand their influence.

A time series displaying an upward trend and seasonal variations over 24 months.

Frequency

Time series data is associated with a specific frequency, which defines the interval between consecutive observations. This could be hourly, daily, weekly, monthly, quarterly, annually, or even finer (e.g., minutely, secondly) or irregular intervals. Knowing the frequency is important for:

Understanding Seasonality: Seasonal patterns are defined relative to the frequency (e.g., a pattern repeating every 12 points in monthly data is yearly seasonality).
Data Handling: Libraries like Pandas use frequency information for tasks like date range generation and alignment.
Model Interpretation: The interpretation of model parameters and forecasts depends on the time unit.

Most techniques assume the data is collected at regular intervals (a fixed frequency), though methods exist for handling irregularly spaced time series.

Changing Statistical Properties (Non-Stationarity)

A significant characteristic of many raw time series is non-stationarity. This means the statistical properties of the series, such as its mean, variance, or autocorrelation structure, change over time. A series exhibiting a clear trend or strong seasonality is typically non-stationary because the mean varies with the trend or season.

Many time series models, including the ARIMA models we will study later, assume the data is stationary. Therefore, identifying non-stationarity and transforming the data to achieve stationarity (often through differencing or decomposition) is a common and necessary step in the analysis workflow. We will dedicate Chapter 2 to understanding and addressing stationarity.

Recognizing these characteristics, temporal dependence, fixed ordering, potential patterns, frequency, and the possibility of non-stationarity, is the starting point for any time series analysis. They inform how we preprocess the data, visualize it, select appropriate models, and ultimately generate meaningful forecasts.

Was this section helpful?