While Standardization centers data around zero and scales it based on standard deviation, Normalization, often referred to as Min-Max Scaling, takes a different approach. Its primary goal is to rescale numerical features to lie within a specific, predefined range, most commonly [0, 1].
This technique is particularly useful when you need your features bounded within a consistent interval. This is often beneficial for algorithms that don't make strong assumptions about the distribution of the data (unlike Standardization, which works well for approximately Gaussian distributions) or for algorithms that compute distances or use gradient descent, where features on vastly different scales can cause issues. Image processing often uses normalization to scale pixel intensities, which naturally fall in a range like [0, 255], down to [0, 1].
The transformation is achieved using the following formula for each feature X:
Xscaled=Xmax−XminX−XminHere:
This formula linearly maps the original feature range [Xmin,Xmax] to the new range [0, 1]. If a value is equal to the minimum (Xmin), it gets mapped to 0. If it's equal to the maximum (Xmax), it gets mapped to 1. All other values fall proportionally in between.
While [0, 1] is the most common target range, the formula can be generalized to scale to an arbitrary range [a,b]:
Xscaled=a+Xmax−Xmin(X−Xmin)(b−a)However, Scikit-learn's implementation defaults to the [0, 1] range, which is sufficient for most use cases.
Scikit-learn provides a convenient implementation through the MinMaxScaler
class in the sklearn.preprocessing
module. Like other transformers in Scikit-learn, it follows the fit
and transform
pattern.
Important: The scaler must be fitted only using the training data. The minimum (Xmin) and maximum (Xmax) values learned during this fit
step are then used to transform
both the training data and any subsequent data (like the validation or test set). This prevents information from the test set leaking into the preprocessing step, ensuring a realistic evaluation of model performance.
Here's a basic example:
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
# Sample data
data = {'Feature1': np.random.rand(100) * 100,
'Feature2': np.random.rand(100) * 50 - 25} # Includes negative values
df = pd.DataFrame(data)
# Add an outlier to demonstrate sensitivity
df.loc[100] = {'Feature1': 1000, 'Feature2': 200}
# Split data (essential BEFORE fitting the scaler)
X_train, X_test = train_test_split(df, test_size=0.2, random_state=42)
# Initialize the scaler
scaler = MinMaxScaler(feature_range=(0, 1)) # Default range
# Fit the scaler ONLY on the training data
scaler.fit(X_train)
# Transform both training and test data
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)
# The result is a NumPy array. Convert back to DataFrame if needed.
X_train_scaled_df = pd.DataFrame(X_train_scaled, columns=X_train.columns, index=X_train.index)
X_test_scaled_df = pd.DataFrame(X_test_scaled, columns=X_test.columns, index=X_test.index)
print("Original Training Data Sample:\n", X_train.head())
# print("\nMin learned:", scaler.data_min_) # Min values learned from training data
# print("Max learned:", scaler.data_max_) # Max values learned from training data
print("\nScaled Training Data Sample:\n", X_train_scaled_df.head())
print("\nScaled Test Data Sample:\n", X_test_scaled_df.head())
# Verify the range of scaled training data (should be close to 0 and 1)
print("\nScaled Training Min:\n", X_train_scaled_df.min())
print("\nScaled Training Max:\n", X_train_scaled_df.max())
# Note: Test data might fall outside [0, 1] if it contains values
# outside the range seen during training. This is expected behavior.
print("\nScaled Test Min:\n", X_test_scaled_df.min())
print("\nScaled Test Max:\n", X_test_scaled_df.max())
Notice how scaler.fit()
is called only once, using X_train
. Both X_train
and X_test
are then scaled using scaler.transform()
.
Min-Max scaling compresses the data into the [0, 1] range but preserves the overall shape of the distribution. However, outliers can significantly impact the scaling of the other data points.
Distribution shape is preserved, but the range is compressed to [0, 1]. Notice how the presence of a single large outlier (1000) squashes the majority of the data points into a small portion of the [0, 1] range in the scaled version.
Advantages:
Disadvantages:
Choose Min-Max scaling when:
If your data contains significant outliers, or if your algorithm benefits from zero-centered data with unit variance (like PCA, SVM, Logistic Regression, Linear Regression), Standardization or Robust Scaling (discussed next) are often better choices.
In summary, Min-Max scaling is a straightforward method for bringing features to a common scale, but its sensitivity to outliers requires careful consideration. Always remember to fit the scaler on your training data and transform both training and test sets consistently.
© 2025 ApX Machine Learning