Now that we have methods for handling categorical data, we turn to numerical features. The range and distribution of raw numerical values can directly impact the effectiveness of algorithms sensitive to feature scales, such as distance-based methods or those using gradient descent. This chapter focuses on techniques to prepare numerical data for modeling by adjusting its scale and distribution.
We will cover common scaling methods, including Standardization (Z-score scaling), which results in data with zero mean and unit variance (Z=σx−μ), and Normalization (Min-Max scaling), which confines values to a specific interval like [0, 1]. We will also look at Robust Scaling for data containing outliers. Additionally, you will learn about transformations like Log, Box-Cox, and Yeo-Johnson, used to modify skewed distributions and make data more suitable for certain modeling assumptions. The chapter provides guidance on selecting and applying these techniques effectively using Scikit-learn.
4.1 The Need for Feature Scaling
4.2 Standardization (Z-score Scaling)
4.3 Normalization (Min-Max Scaling)
4.4 Robust Scaling for Outliers
4.5 Log Transformation for Skewed Data
4.6 Box-Cox Transformation
4.7 Yeo-Johnson Transformation
4.8 Quantile Transformation
4.9 Choosing the Right Scaling/Transformation Method
4.10 Hands-on Practical: Scaling and Transforming Features
© 2025 ApX Machine Learning