As discussed previously, standard regression techniques often focus on predicting the conditional mean of the target variable. Quantile regression, however, allows us to model conditional quantiles (like the median, 10th percentile, or 90th percentile), providing a more comprehensive understanding of the relationship between predictors and the target variable's distribution. To perform quantile regression using gradient boosting, we need to define and implement the appropriate loss function: the quantile loss, also known as the pinball loss.
The quantile loss function for a specific quantile α∈(0,1) measures the error between the true value y and the predicted quantile y^. It is defined as:
Lα(y,y^)={α(y−y^)(1−α)(y^−y)if y−y^>0if y−y^≤0This can be written more compactly using the indicator function I(⋅):
Lα(y,y^)=(y−y^)(α−I(y−y^<0))Notice the asymmetry:
The following plot visualizes the pinball loss for different values of α.
The plot shows the characteristic 'pinball' shape of the quantile loss. Note how the slope changes at zero error, and the asymmetry depends on the quantile α. For α=0.9, positive errors (underpredictions) are penalized more, while for α=0.1, negative errors (overpredictions) incur a larger penalty. α=0.5 treats positive and negative errors symmetrically.
Gradient boosting algorithms iteratively fit base learners (typically trees) to the negative gradient of the loss function with respect to the current prediction. Therefore, we need the first derivative (gradient) of Lα(y,y^) with respect to y^. Some advanced implementations like XGBoost and LightGBM also use the second derivative (Hessian) for faster convergence and regularization.
Let's find the gradient:
g=∂y^∂Lα(y,y^)=I(y−y^<0)−α=I(y^>y)−αSo, the gradient is:
Now, consider the Hessian:
h=∂y^2∂2Lα(y,y^)=∂y^∂gThe gradient is a step function, which means its derivative is zero everywhere except at the point y^=y, where it is undefined (mathematically, it involves a Dirac delta function). Standard gradient boosting implementations, particularly those using second-order approximations like XGBoost, require a defined Hessian.
How do we handle this?
objective='quantile'
). In these cases, the library handles the Hessian internally, possibly using approximations or specific algorithms suited for non-smooth objectives. Using the built-in objective is generally the recommended approach if available.Let's see how you would structure a custom quantile loss function for libraries like XGBoost or LightGBM in Python. These libraries expect a function that takes the current predictions (preds
) and the true labels (dtrain
, which contains y
) and returns the gradient and Hessian for each sample.
Here's a Python function structure for a custom quantile objective:
import numpy as np
def quantile_objective(alpha):
"""
Custom objective function for quantile regression.
Parameters:
alpha (float): The target quantile, must be in (0, 1).
Returns:
callable: A function compatible with XGBoost/LightGBM custom objectives.
"""
def objective_function(preds, dtrain):
"""
Calculates gradient and Hessian for quantile loss.
Parameters:
preds (np.ndarray): Current model predictions.
dtrain: Data container (e.g., xgboost.DMatrix or lightgbm.Dataset)
containing true labels via dtrain.get_label().
Returns:
grad (np.ndarray): The gradient of the loss with respect to preds.
hess (np.ndarray): The Hessian of the loss with respect to preds.
"""
labels = dtrain.get_label()
errors = preds - labels # Note: Using preds - labels aligns with I(preds > y) convention
# Calculate Gradient
grad = np.where(errors > 0, 1 - alpha, -alpha)
# Calculate Hessian (using a small constant approximation)
# A small positive constant helps ensure stability in the algorithm.
# The exact value might require tuning or experimentation.
hess = np.full_like(preds, 1.0) # Or a smaller value like 1e-6
return grad, hess
return objective_function
# Example Usage
# alpha_value = 0.75 # Target the 75th percentile
# custom_obj = quantile_objective(alpha_value)
# In XGBoost:
# model = xgb.train(params, dtrain, num_boost_round=100, obj=custom_obj)
# In LightGBM:
# model = lgb.train(params, dtrain, num_boost_round=100, fobj=custom_obj)
Important Considerations:
objective='quantile'
with alpha
parameter in LightGBM) which are often optimized and preferred over custom implementations. CatBoost also supports quantile loss functions.By implementing or utilizing the quantile loss function, you equip gradient boosting models with the capability to move beyond mean prediction and estimate conditional quantiles, offering much richer insights into the underlying data distribution for various specialized tasks.
© 2025 ApX Machine Learning