Real-world datasets are often incomplete. Entries might be missing due to errors during data collection, transmission issues, or simply because the information was not available. These gaps, frequently represented as NaN
(Not a Number), NULL
, or other placeholders, can significantly interfere with data analysis and the performance of machine learning models. Many algorithms cannot function correctly when faced with missing values.
This chapter focuses on equipping you with fundamental methods to manage this common issue. You will learn:
We will use practical examples to illustrate these techniques, providing a foundation for preparing cleaner, more reliable data.
2.1 What Are Missing Values?
2.2 Methods for Detecting Missing Data
2.3 Visualizing Missing Data Patterns
2.4 Strategy 1: Deleting Rows (Listwise Deletion)
2.5 Strategy 2: Deleting Columns
2.6 Strategy 3: Basic Imputation (Mean/Median/Mode)
2.7 Considerations for Choosing a Strategy
2.8 Handling Missing Data: Hands-on Practical
© 2025 ApX Machine Learning