Standard regression techniques, including gradient boosting configured for mean prediction (e.g., using squared error loss), focus on estimating the conditional mean of the target variable, E[Y∣X]. While useful, this provides only a partial picture of the relationship between features and the target. In many applications, understanding the entire conditional distribution of the target, or specific parts of it like the tails, is more important. For instance, in risk management, predicting the 95th percentile of potential losses is more informative than predicting the average loss. Similarly, in resource planning, understanding the range of likely outcomes (e.g., 10th to 90th percentile) helps in making more informed decisions.
Quantile regression allows us to model the conditional quantiles of the target variable. Instead of just predicting the center of the distribution, we can predict points below which a certain proportion of the data lies. For example, the 0.5 quantile corresponds to the median, the 0.25 quantile to the first quartile, and the 0.9 quantile to the 90th percentile.
Gradient boosting frameworks can be effectively adapted for quantile regression by employing a specific loss function: the quantile loss, often called the pinball loss.
For a given quantile level α∈(0,1), the quantile loss function is defined as:
Lα(y,y^)={α(y−y^)(1−α)(y^−y)if y−y^>0if y−y^≤0Here, y is the true value and y^ is the predicted value. This loss function asymmetrically penalizes errors.
When α=0.5 (the median), the penalty is 21∣y−y^∣, which is equivalent to minimizing the Mean Absolute Error (MAE), making median regression robust to outliers. For α=0.9, underestimations are penalized more heavily (with weight 0.9) than overestimations (with weight 0.1), encouraging the model to predict higher values. Conversely, for α=0.1, overestimations are penalized more (weight 0.9) than underestimations (weight 0.1).
The pinball loss function penalizes errors asymmetrically based on the chosen quantile α. For α=0.5, it's symmetric (scaled MAE). For α=0.9, underestimation incurs a larger penalty. For α=0.1, overestimation incurs a larger penalty.
The core mechanism of gradient boosting involves sequentially adding weak learners (typically trees) that predict the negative gradient (pseudo-residuals) of the loss function with respect to the predictions from the previous iteration. To perform quantile regression, we simply use the quantile loss Lα instead of, for example, the squared error loss.
The negative gradient of the quantile loss with respect to the prediction y^ is:
−∂y^∂Lα(y,y^)={α−(1−α)if y−y^>0if y−y^≤0At each boosting iteration m, the new tree hm(x) is trained to predict these pseudo-residuals calculated using the current ensemble prediction Fm−1(x):
rim=−[∂F(xi)∂Lα(yi,F(xi))]F(xi)=Fm−1(xi)={α−(1−α)if yi>Fm−1(xi)if yi≤Fm−1(xi)Notice that the pseudo-residuals are constant values (α or −(1−α)) depending only on the sign of the error yi−Fm−1(xi). The trees are fitted to these constant values within regions defined by the splits, aiming to partition the data such that future predictions align with the desired quantile.
Major gradient boosting libraries provide built-in support for quantile regression.
Scikit-learn: The GradientBoostingRegressor
offers quantile loss by setting the loss
parameter to 'quantile'
and specifying the desired quantile via the alpha
parameter.
from sklearn.ensemble import GradientBoostingRegressor
# Model for the 90th percentile
gbr_q90 = GradientBoostingRegressor(loss='quantile', alpha=0.90, n_estimators=100)
# Model for the 10th percentile
gbr_q10 = GradientBoostingRegressor(loss='quantile', alpha=0.10, n_estimators=100)
# Model for the median (50th percentile)
gbr_median = GradientBoostingRegressor(loss='quantile', alpha=0.50, n_estimators=100)
# gbr_q90.fit(X_train, y_train)
# gbr_q10.fit(X_train, y_train)
# gbr_median.fit(X_train, y_train)
# y_pred_q90 = gbr_q90.predict(X_test)
# y_pred_q10 = gbr_q10.predict(X_test)
# y_pred_median = gbr_median.predict(X_test)
LightGBM: The LGBMRegressor
supports quantile regression by setting the objective
parameter to 'quantile'
and specifying the alpha
parameter.
import lightgbm as lgb
# Model for the 75th percentile
lgbm_q75 = lgb.LGBMRegressor(objective='quantile', alpha=0.75, n_estimators=100)
# lgbm_q75.fit(X_train, y_train)
# y_pred_q75 = lgbm_q75.predict(X_test)
XGBoost: XGBoost also supports quantile regression by setting the objective
parameter to 'reg:quantileerror'
. However, as of recent versions, XGBoost primarily optimizes this objective internally using approximations or might require custom objective functions for precise quantile loss implementation, especially compared to the direct support in LightGBM and Scikit-learn. Check the specific version documentation for the most current approach. A common alternative is to implement the quantile loss and its gradients (first and second derivatives, Hessians for XGBoost) as a custom objective.
CatBoost: CatBoost supports quantile regression using the loss_function
parameter set to 'Quantile:alpha=...'
, for example, 'Quantile:alpha=0.8'
.
from catboost import CatBoostRegressor
# Model for the 20th percentile
cat_q20 = CatBoostRegressor(loss_function='Quantile:alpha=0.2', iterations=100)
# cat_q20.fit(X_train, y_train)
# y_pred_q20 = cat_q20.predict(X_test)
Important Note: To predict multiple quantiles (e.g., 10th, 50th, and 90th percentiles), you typically need to train a separate gradient boosting model for each desired quantile α. Each model minimizes the corresponding quantile loss function.
Imagine predicting electricity demand. While predicting the average demand is useful, predicting the 10th and 90th percentiles provides an operational range, helping grid operators prepare for low-demand and high-demand scenarios.
Example showing actual data points and predictions from three separate gradient boosting models trained for the 10th percentile (Q10), median (Q50), and 90th percentile (Q90). The interval between Q10 and Q90 provides an 80% prediction interval.
Advantages:
Considerations:
In summary, gradient boosting offers a powerful and flexible approach to quantile regression by minimizing the quantile (pinball) loss function. This extends the capabilities of these algorithms to applications where understanding prediction intervals or specific parts of the conditional distribution is essential. By training separate models for different α values, you can construct detailed distributional forecasts using the strengths of boosting algorithms.
© 2025 ApX Machine Learning