Datasets often come with gaps, represented in Pandas as NaN (Not a Number). One straightforward approach to dealing with these gaps is simply to remove the rows or columns that contain them. This is often a reasonable first step, especially if only a small fraction of your data is missing or if a particular row or column has so many missing values that it's not informative.
Pandas provides the dropna() method for this purpose. Let's explore how it works.
By default, dropna() removes entire rows if any value in that row is NaN.
Consider this example DataFrame:
import pandas as pd
import numpy as np
data = {'col1': [1, 2, np.nan, 4, 5],
'col2': [np.nan, 7, 8, 9, 10],
'col3': [11, 12, 13, 14, np.nan],
'col4': ['A', 'B', 'C', 'D', 'E']}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
Output:
Original DataFrame:
col1 col2 col3 col4
0 1.0 NaN 11.0 A
1 2.0 7.0 12.0 B
2 NaN 8.0 13.0 C
3 4.0 9.0 14.0 D
4 5.0 10.0 NaN E
Now, let's use dropna() with its default settings:
df_dropped_rows = df.dropna() # Default is axis=0 (rows) and how='any'
print("\nDataFrame after dropping rows with any NaN:")
print(df_dropped_rows)
Output:
DataFrame after dropping rows with any NaN:
col1 col2 col3 col4
1 2.0 7.0 12.0 B
3 4.0 9.0 14.0 D
Notice that rows 0, 2, and 4 were removed because each contained at least one NaN value. Only rows 1 and 3, which had complete data across all columns, were kept.
The dropna() method has parameters to give you more control:
how parameter:
how='any' (default): Drop the row if any NaN values are present.how='all': Drop the row only if all values in that row are NaN.Let's create a DataFrame where one row is entirely NaN:
data_with_all_nan = {'col1': [1, np.nan, np.nan, 4],
'col2': [np.nan, 7, np.nan, 9],
'col3': [11, 12, np.nan, 14]}
df_all_nan = pd.DataFrame(data_with_all_nan)
print("\nOriginal DataFrame with an all-NaN row possibility:")
print(df_all_nan)
df_dropped_all = df_all_nan.dropna(how='all')
print("\nDataFrame after dropping rows with all NaN:")
print(df_dropped_all)
Output:
Original DataFrame with an all-NaN row possibility:
col1 col2 col3
0 1.0 NaN 11.0
1 NaN 7.0 12.0
2 NaN NaN NaN
3 4.0 9.0 14.0
DataFrame after dropping rows with all NaN:
col1 col2 col3
0 1.0 NaN 11.0
1 NaN 7.0 12.0
3 4.0 9.0 14.0
In this case, only row 2, where all values were NaN, was dropped when using how='all'.
thresh parameter: This lets you specify a minimum number of non-missing values required for a row to be kept. For example, thresh=3 means a row will be kept only if it has at least 3 valid (non-NaN) values.
Using our original df:
# Keep rows with at least 3 non-NaN values
df_thresh3 = df.dropna(thresh=3)
print("\nDataFrame keeping rows with at least 3 non-NaN values:")
print(df_thresh3)
Output:
DataFrame keeping rows with at least 3 non-NaN values:
col1 col2 col3 col4
0 1.0 NaN 11.0 A # Kept (3 non-NaN)
1 2.0 7.0 12.0 B # Kept (4 non-NaN)
2 NaN 8.0 13.0 C # Kept (3 non-NaN)
3 4.0 9.0 14.0 D # Kept (4 non-NaN)
4 5.0 10.0 NaN E # Kept (3 non-NaN)
Here, all rows were kept because each had at least 3 non-missing values. If we increased the threshold:
# Keep rows with at least 4 non-NaN values
df_thresh4 = df.dropna(thresh=4)
print("\nDataFrame keeping rows with at least 4 non-NaN values:")
print(df_thresh4)
Output:
DataFrame keeping rows with at least 4 non-NaN values:
col1 col2 col3 col4
1 2.0 7.0 12.0 B
3 4.0 9.0 14.0 D
Now, only rows 1 and 3 are kept, as they are the only ones with 4 valid values.
Sometimes, you might want to remove entire columns if they contain missing data, especially if a column has many NaNs or is not essential for your analysis. You can do this by setting the axis parameter to 1 (or 'columns').
# Drop columns containing any NaN values
df_dropped_cols = df.dropna(axis=1) # axis=1 targets columns
print("\nDataFrame after dropping columns with any NaN:")
print(df_dropped_cols)
Output:
DataFrame after dropping columns with any NaN:
col4
0 A
1 B
2 C
3 D
4 E
In our example df, columns col1, col2, and col3 all contained at least one NaN, so they were dropped. Only col4, which had no missing values, remained.
The how and thresh parameters work similarly when applied to columns:
df.dropna(axis=1, how='all') would drop columns only if all their values are NaN.df.dropna(axis=1, thresh=4) would keep columns only if they have at least 4 non-NaN values.# Keep columns with at least 4 non-NaN values
df_thresh4_cols = df.dropna(axis=1, thresh=4)
print("\nDataFrame keeping columns with at least 4 non-NaN values:")
print(df_thresh4_cols)
Output:
DataFrame keeping columns with at least 4 non-NaN values:
col1 col2 col3 col4
0 1.0 NaN 11.0 A
1 2.0 7.0 12.0 B
2 NaN 8.0 13.0 C
3 4.0 9.0 14.0 D
4 5.0 10.0 NaN E
In this case, col1, col2, and col3 each have 4 non-NaN values (out of 5 total rows), and col4 has 5. Since all meet the threshold of 4, no columns are dropped.
By default, dropna() returns a new DataFrame with the missing values dropped, leaving the original DataFrame unchanged. If you want to modify the original DataFrame directly, you can use the inplace=True parameter.
df_copy = df.copy() # Make a copy to modify
print("\nDataFrame before inplace drop:")
print(df_copy)
df_copy.dropna(inplace=True) # Modifies df_copy directly
print("\nDataFrame after inplace drop:")
print(df_copy)
Output:
DataFrame before inplace drop:
col1 col2 col3 col4
0 1.0 NaN 11.0 A
1 2.0 7.0 12.0 B
2 NaN 8.0 13.0 C
3 4.0 9.0 14.0 D
4 5.0 10.0 NaN E
DataFrame after inplace drop:
col1 col2 col3 col4
1 2.0 7.0 12.0 B
3 4.0 9.0 14.0 D
Use inplace=True with caution. Since it modifies your data directly, it's often safer to assign the result to a new variable unless you are certain you no longer need the original data with the NaN values.
Dropping missing data is simple, but it comes at a cost: you lose information.
This strategy is generally most suitable when:
Always consider the potential impact of removing data before doing so. If dropping seems too drastic, the next section explores an alternative: filling in the missing values.
Was this section helpful?
dropna method, detailing its parameters, behavior, and various ways to remove missing data from DataFrames.© 2026 ApX Machine LearningEngineered with