While the Log and Box-Cox transformations are effective for handling positive skewed data, they run into limitations when your data includes zero or negative values. Log transformation is undefined for non-positive numbers, and the standard Box-Cox transformation requires strictly positive input. This is where the Yeo-Johnson transformation comes into play.
Developed by I.K. Yeo and R.A. Johnson in 2000, this transformation is part of the power transformation family, similar to Box-Cox, but it extends its applicability to data containing non-positive values. The goal remains the same: to stabilize variance, reduce skewness, and make the data distribution more closely resemble a normal (Gaussian) distribution, which can be beneficial for certain modeling algorithms.
Like Box-Cox, the Yeo-Johnson transformation seeks an optimal parameter, often denoted as λ, to apply a power transformation. However, it uses slightly different formulas depending on whether the data point x is non-negative or negative:
You don't typically need to memorize these formulas. The important concept is that the transformation provides a continuous function across zero and handles positive, zero, and negative values consistently to achieve symmetry. The optimal λ is usually determined computationally, often using maximum likelihood estimation, to find the transformation that makes the resulting data distribution as close to Gaussian as possible.
Scikit-learn provides a convenient implementation through the PowerTransformer
class. You simply specify method='yeo-johnson'
.
Let's generate some skewed data that includes non-positive values and apply the transformation:
import numpy as np
import pandas as pd
from sklearn.preprocessing import PowerTransformer
import matplotlib.pyplot as plt
import seaborn as sns
# Generate skewed data including zero and negative values
np.random.seed(42)
data_positive = np.random.exponential(scale=2, size=70)
data_nonpositive = -np.random.exponential(scale=2, size=30) + 1 # Shift to include 0 and negatives
skewed_data = np.concatenate([data_positive, data_nonpositive])
skewed_data = skewed_data.reshape(-1, 1) # Reshape for Scikit-learn transformer
# Initialize and apply Yeo-Johnson transformation
yj_transformer = PowerTransformer(method='yeo-johnson', standardize=True)
# standardize=True applies zero-mean, unit-variance scaling *after* the transformation
# Set to False if you only want the power transformation
transformed_data_yj = yj_transformer.fit_transform(skewed_data)
# Print the optimal lambda found
print(f"Optimal lambda found: {yj_transformer.lambdas_[0]:.4f}")
# Prepare data for plotting
df = pd.DataFrame({
'Original': skewed_data.flatten(),
'Yeo-Johnson Transformed': transformed_data_yj.flatten()
})
# Visualize distributions before and after
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
sns.histplot(df['Original'], kde=True, ax=axes[0], color='#4263eb')
axes[0].set_title('Original Data Distribution')
sns.histplot(df['Yeo-Johnson Transformed'], kde=True, ax=axes[1], color='#37b24d')
axes[1].set_title('Yeo-Johnson Transformed Data Distribution')
plt.tight_layout()
plt.show()
Running this code will likely show that the original data has significant skew, while the histogram of the transformed data appears much more symmetric and bell-shaped. The standardize=True
argument (default) ensures the output also has zero mean and unit variance, combining the transformation and scaling steps.
Here's a Plotly visualization comparing the distributions:
Comparison of data distributions. The left plot shows the original skewed data containing negative values, while the right plot shows the distribution after applying the Yeo-Johnson transformation, appearing much closer to a normal distribution. Sample data is used for illustration.
PowerTransformer
only on your training dataset. Use the same fitted transformer (with the learned λ) to apply the transformation to your validation and test sets to avoid data leakage and ensure consistency.standardize
parameter in PowerTransformer
. If True
(default), it applies Z-score scaling after the Yeo-Johnson transformation. If you need the raw power-transformed values without standardization, set standardize=False
.Yeo-Johnson provides a valuable and flexible tool for normalizing the distribution of numerical features, especially when those features contain the full range of real numbers, overcoming a key limitation of the Box-Cox method. It's a common technique used to prepare data for algorithms sensitive to feature distribution.
© 2025 ApX Machine Learning