Quantile Transformation offers a distinct approach to modifying feature distributions. Unlike methods such as Standardization and Normalization, which adjust the scale of features, or transformations like Log and Box-Cox, which attempt to make distributions more Gaussian-like, Quantile Transformation is a non-linear process. It maps a feature's probability distribution to a specified uniform or normal distribution, independent of its original shape. This is achieved by leveraging the ranks, or quantiles, of the data points.
The core idea behind quantile transformation is to estimate the empirical cumulative distribution function (CDF) of a feature and then use this CDF to map the original values to the desired output distribution.
Because this method relies on the rank order of the data points rather than their absolute values, it is inherently strong to outliers. Outliers will be mapped to the extreme ends of the target distribution (e.g., close to 0 or 1 for uniform, or large negative/positive values for normal) but won't disproportionately affect the transformation of other points, unlike StandardScaler or MinMaxScaler.
Scikit-learn provides the sklearn.preprocessing.QuantileTransformer class for this purpose. Let's see how to use it.
import numpy as np
import pandas as pd
from sklearn.preprocessing import QuantileTransformer
import plotly.graph_objects as go
from plotly.subplots import make_subplots
# Generate some skewed data
np.random.seed(42)
data_original = np.random.exponential(scale=2, size=1000).reshape(-1, 1) + 1 # Add 1 to avoid issues with zero if using log later
# Initialize transformers
qt_uniform = QuantileTransformer(output_distribution='uniform', n_quantiles=1000, random_state=42)
qt_normal = QuantileTransformer(output_distribution='normal', n_quantiles=1000, random_state=42)
# Apply transformations
data_uniform = qt_uniform.fit_transform(data_original)
data_normal = qt_normal.fit_transform(data_original)
# Create DataFrame for easier plotting
df = pd.DataFrame({
'Original': data_original.flatten(),
'Uniform Quantile': data_uniform.flatten(),
'Normal Quantile': data_normal.flatten()
})
# --- Visualization ---
fig = make_subplots(rows=1, cols=3, subplot_titles=('Original Exponential Data', 'Uniform Quantile Transformed', 'Normal Quantile Transformed'))
fig.add_trace(go.Histogram(x=df['Original'], name='Original', marker_color='#4dabf7'), row=1, col=1)
fig.add_trace(go.Histogram(x=df['Uniform Quantile'], name='Uniform', marker_color='#38d9a9'), row=1, col=2)
fig.add_trace(go.Histogram(x=df['Normal Quantile'], name='Normal', marker_color='#be4bdb'), row=1, col=3)
fig.update_layout(
title_text='Effect of Quantile Transformation on Skewed Data',
bargap=0.1,
showlegend=False,
height=350,
margin=dict(l=20, r=20, t=60, b=20)
)
# Display the Plotly chart JSON
# print(fig.to_json()) # You would run this in your environment
The distribution of the original exponential data.
The distribution after uniform quantile transformation. Notice how the values are spread out evenly.
The distribution after normal quantile transformation. The data now resembles a Gaussian shape.
Quantile transformation offers several advantages and important considerations:
fit_transform: Like other scalers, it's important to apply fit only on the training data and then transform both training and test data. Applying fit_transform directly to the entire dataset (including test data) before splitting can lead to data leakage, where information from the test set implicitly influences the training process, resulting in overly optimistic performance estimates.Consider using quantile transformation when:
While a powerful tool, quantile transformation can sometimes make the model's interpretability more challenging, as the transformed values no longer have a direct, linear relationship to the original scale. However, for many predictive modeling tasks, the improved model performance often outweighs this drawback.
Was this section helpful?
QuantileTransformer class, detailing its parameters, usage, and underlying mechanism.© 2026 ApX Machine LearningEngineered with