Once you have successfully fitted an ARIMA model and performed diagnostic checks to ensure its validity (as covered in the "Model Diagnostics and Residual Analysis" section), the next logical step is to use this model to predict future values. This process is known as forecasting.
The statsmodels
library provides convenient methods attached to the fitted model results object (often named results
or arima_results
in examples) to generate forecasts. The two primary methods are predict()
and forecast()
. While predict()
can be used for both in-sample prediction and out-of-sample forecasting, the forecast()
method is specifically designed and often more straightforward for generating predictions beyond the end of the training data.
Let's assume you have a fitted ARIMA results object named arima_results
obtained from fitting an ARIMA model to your time series data.
forecast()
The forecast()
method is the simplest way to generate out-of-sample forecasts. You only need to specify the number of steps (time periods) you want to predict into the future.
# Assume arima_results is your fitted ARIMA model object
# Forecast the next 12 time steps
forecast_steps = 12
forecast_values = arima_results.forecast(steps=forecast_steps)
print(forecast_values)
This will return a Pandas Series containing the point forecasts for the specified number of future time steps. The index of the returned Series will typically follow the time index of your original data.
predict()
The predict()
method offers more flexibility. It allows you to specify start
and end
points for prediction. These points can be indices or timestamps.
start
and end
fall within the range of the original data index, predict()
generates fitted values.start
and/or end
fall beyond the original data index, predict()
generates forecasts.# Assume arima_results is your fitted model
# Assume original data ends at index 'n' or timestamp 't_end'
# Get the index of the last observation
last_index = ts_data.index[-1] # Or use the integer index if applicable
# Define start and end for forecasting 12 steps ahead
# Note: indices must be compatible with your data's index type (e.g., datetime)
forecast_start_index = last_index + pd.Timedelta(days=1) # Example for daily data
forecast_end_index = last_index + pd.Timedelta(days=12) # Example for daily data
# Or using integer indices if applicable
# forecast_start_index = len(ts_data)
# forecast_end_index = len(ts_data) + 11
forecast_values_pred = arima_results.predict(start=forecast_start_index, end=forecast_end_index)
print(forecast_values_pred)
While predict()
works, forecast()
is generally preferred for its simplicity when purely generating future values.
Point forecasts provide a single best estimate for future values, but they don't convey the uncertainty associated with the prediction. ARIMA models, being statistical models, allow us to calculate confidence intervals around these forecasts. A confidence interval provides a range within which the true future value is expected to lie with a certain probability (e.g., 95%).
To get both the point forecast and the confidence interval, use the get_forecast()
method. This returns a PredictionResults
object containing more detailed information.
# Assume arima_results is your fitted model
forecast_steps = 12
# Get forecast object
forecast_obj = arima_results.get_forecast(steps=forecast_steps)
# Extract predicted mean (point forecast)
predicted_mean = forecast_obj.predicted_mean
# Extract confidence intervals (default alpha=0.05 for 95% CI)
confidence_intervals = forecast_obj.conf_int(alpha=0.05)
# confidence_intervals is a DataFrame with columns like 'lower y' and 'upper y'
print("Point Forecasts:\n", predicted_mean)
print("\nConfidence Intervals (95%):\n", confidence_intervals)
The alpha
parameter determines the confidence level. alpha=0.05
corresponds to a 95% confidence interval (1 - alpha), meaning we expect the true value to fall within the calculated lower and upper bounds 95% of the time, assuming the model is correctly specified.
Visualizing the forecasts alongside the historical data and confidence intervals is highly recommended. It provides an intuitive understanding of the model's predictions and the associated uncertainty.
Here's how you might plot this using Plotly:
import pandas as pd
import plotly.graph_objects as go
# Assume:
# ts_data: Original historical time series (Pandas Series)
# predicted_mean: Forecasted values (Pandas Series)
# confidence_intervals: DataFrame with 'lower y' and 'upper y' columns
# Sample Data (replace with your actual data)
# Create dummy historical data
dates_hist = pd.to_datetime(pd.date_range(start='2023-01-01', periods=50, freq='D'))
ts_data = pd.Series(range(50), index=dates_hist) + 10 * (pd.Series(range(50))/50)**2 + 5 * pd.np.random.randn(50)
# Create dummy forecast data
dates_fcst = pd.to_datetime(pd.date_range(start=ts_data.index[-1] + pd.Timedelta(days=1), periods=12, freq='D'))
predicted_mean = pd.Series([ts_data.iloc[-1] + i * 0.5 + 2 * pd.np.random.randn(1)[0] for i in range(1, 13)], index=dates_fcst)
ci_lower = predicted_mean - (pd.Series(range(1, 13)) * 0.8)
ci_upper = predicted_mean + (pd.Series(range(1, 13)) * 0.8)
confidence_intervals = pd.DataFrame({'lower y': ci_lower, 'upper y': ci_upper})
# Create the plot
fig = go.Figure()
# Add historical data
fig.add_trace(go.Scatter(
x=ts_data.index,
y=ts_data,
mode='lines',
name='Historical Data',
line=dict(color='#1c7ed6') # blue
))
# Add forecast line
fig.add_trace(go.Scatter(
x=predicted_mean.index,
y=predicted_mean,
mode='lines',
name='Forecast',
line=dict(color='#f76707') # orange
))
# Add confidence interval area
fig.add_trace(go.Scatter(
x=confidence_intervals.index.tolist() + confidence_intervals.index.tolist()[::-1], # x values for shape
y=confidence_intervals['upper y'].tolist() + confidence_intervals['lower y'].tolist()[::-1], # y values for shape
fill='toself',
fillcolor='rgba(253, 126, 20, 0.2)', # orange transparent
line=dict(color='rgba(255,255,255,0)'), # No border line
hoverinfo="skip", # Don't show hover label for the shape
name='95% Confidence Interval'
))
# Update layout for better presentation
fig.update_layout(
title='ARIMA Forecast with Confidence Interval',
xaxis_title='Time',
yaxis_title='Value',
hovermode='x unified',
legend=dict(x=0.01, y=0.99)
)
# Show the plot (in a Jupyter environment, otherwise use fig.show())
# fig.show()
# Convert to JSON for embedding if needed
# print(fig.to_json())
Historical data shown in blue, point forecasts in orange, and the shaded region represents the 95% confidence interval.
Observe how the confidence interval typically widens as the forecast horizon increases. This reflects the growing uncertainty about the future; predictions further out are inherently less certain than near-term predictions.
With the ability to generate forecasts and quantify their uncertainty, you now have a powerful tool for predicting future trends based on historical patterns identified by the ARIMA model. The next chapter will extend these ideas to handle seasonality using SARIMA models.
© 2025 ApX Machine Learning