Datasets are rarely complete. They often contain gaps, represented as missing values. Before you can analyze data or train a model, you need to know where these gaps are and how extensive they might be. Pandas provides straightforward tools to detect missing data, which is conventionally marked using the special floating-point value NaN (Not a Number). Python's None object is also treated as missing data in Pandas objects.
isnull() and notnull()Pandas offers two primary methods for detecting missing values:
isnull(): Returns a boolean object (Series or DataFrame) of the same size as the original, where True indicates a missing value (NaN or None) and False indicates a non-missing value.notnull(): The inverse of isnull(). It returns True for non-missing values and False for missing values.Let's see these in action. First, we'll need pandas and numpy imported.
import pandas as pd
import numpy as np
Now, let's create a simple Pandas Series containing some missing data represented by np.nan:
# Create a Series with missing values
data_series = pd.Series([1, np.nan, 3.5, np.nan, 7])
print("Original Series:")
print(data_series)
Original Series:
0 1.0
1 NaN
2 3.5
3 NaN
4 7.0
dtype: float64
Now, we can use isnull() to create a boolean mask identifying the locations of the NaN values:
# Detect missing values
missing_mask = data_series.isnull()
print("\nBoolean mask from isnull():")
print(missing_mask)
Boolean mask from isnull():
0 False
1 True
2 False
3 True
4 False
dtype: bool
As you can see, the resulting Series contains True at indices 1 and 3, corresponding to the NaN values in the original data_series.
Conversely, notnull() identifies the non-missing values:
# Detect non-missing values
not_missing_mask = data_series.notnull()
print("\nBoolean mask from notnull():")
print(not_missing_mask)
Boolean mask from notnull():
0 True
1 False
2 True
3 False
4 True
dtype: bool
This returns True where the data exists and False where it's missing.
(Note: You might also encounter the aliases isna() for isnull() and notna() for notnull(). They perform the exact same function.)
These methods work similarly on DataFrames, but they return a boolean DataFrame instead of a Series.
Let's create a DataFrame with missing values:
# Create a DataFrame with missing values
data = {'col_a': [1, 2, np.nan, 4, 5],
'col_b': [np.nan, 7, 8, np.nan, 10],
'col_c': [11, 12, 13, 14, 15],
'col_d': ['apple', 'banana', 'orange', np.nan, 'grape']}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
Original DataFrame:
col_a col_b col_c col_d
0 1.0 NaN 11 apple
1 2.0 7.0 12 banana
2 NaN 8.0 13 orange
3 4.0 NaN 14 NaN
4 5.0 10.0 15 grape
Applying isnull() to this DataFrame gives us:
# Detect missing values in the DataFrame
missing_df_mask = df.isnull()
print("\nBoolean mask DataFrame from isnull():")
print(missing_df_mask)
Boolean mask DataFrame from isnull():
col_a col_b col_c col_d
0 False True False False
1 False False False False
2 True False False False
3 False True False True
4 False False False False
This boolean DataFrame directly maps the locations of missing values within the original df.
While seeing the exact location of missing values is useful, you often need a summary. How many missing values are there in total, or per column? You can easily achieve this by summing the results of isnull(), because in numerical contexts, True is treated as 1 and False as 0.
To count missing values in each column:
# Count missing values per column
missing_counts_per_column = df.isnull().sum()
print("\nMissing value counts per column:")
print(missing_counts_per_column)
Missing value counts per column:
col_a 1
col_b 2
col_c 0
col_d 1
dtype: int64
This is a very common operation. It quickly tells you that col_a has one missing value, col_b has two, col_c has none, and col_d has one.
To get the total number of missing values in the entire DataFrame, you can sum the results twice:
# Count total missing values in the DataFrame
total_missing_count = df.isnull().sum().sum()
print(f"\nTotal missing values in the DataFrame: {total_missing_count}")
Total missing values in the DataFrame: 4
Detecting where and how much data is missing is the essential first step in the data cleaning process. Once you've identified these gaps using methods like isnull() and sum(), you can move on to deciding how to handle them, which is the focus of the next sections.
Was this section helpful?
isnull(), notnull(), isna(), notna(), and their application with sum().© 2026 ApX Machine LearningEngineered with