While accessing elements by their numerical position (like array[2]
or array[1, 3]
) and slicing ranges of data (like array[1:5]
) are useful, often you need to select data based on its value or some condition. This is where boolean indexing comes into play. It's a powerful technique that allows you to filter arrays using True/False conditions.
The foundation of boolean indexing is the boolean array. You typically create these arrays by applying comparison operators directly to a NumPy array. NumPy performs the comparison element-wise, returning a new array of the same shape filled with True
or False
values.
Let's see an example:
import numpy as np
# Create a simple 1D array
data = np.array([1, 5, 2, 8, 3, 7, 4, 6])
print(f"Original array: {data}")
# Apply a condition
is_greater_than_4 = data > 4
print(f"Boolean array (data > 4): {is_greater_than_4}")
Executing this code will output:
Original array: [1 5 2 8 3 7 4 6]
Boolean array (data > 4): [False True False True False True False True]
Notice how is_greater_than_4
has the same number of elements as data
. Each element in is_greater_than_4
is True
if the corresponding element in data
is greater than 4, and False
otherwise.
You can use any standard comparison operators: >
, <
, >=
, <=
, ==
(equal to), and !=
(not equal to).
Once you have a boolean array, you can use it directly inside the square brackets []
of the original array. This operation selects only the elements from the original array where the corresponding value in the boolean array is True
.
# Use the boolean array to select elements
selected_data = data[is_greater_than_4]
print(f"Selected elements (where data > 4): {selected_data}")
# You can also apply the condition directly inside the brackets
selected_directly = data[data < 5]
print(f"Selected elements (where data < 5): {selected_directly}")
Output:
Selected elements (where data > 4): [5 8 7 6]
Selected elements (where data < 5): [1 2 3 4]
The result is a new 1D array containing only the elements that satisfied the condition. This works regardless of the original array's dimensions.
Let's try it with a 2D array:
matrix = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Select elements greater than 5
selected_matrix_elements = matrix[matrix > 5]
print(f"\nOriginal 2D array:\n{matrix}")
print(f"Selected elements (where matrix > 5): {selected_matrix_elements}")
Output:
Original 2D array:
[[1 2 3]
[4 5 6]
[7 8 9]]
Selected elements (where matrix > 5): [6 7 8 9]
Important: Notice that even when applying boolean indexing to a 2D array, the result is often a 1D array containing the selected elements. NumPy flattens the output because the True
values might be scattered across different rows and columns, not necessarily forming a rectangular sub-array.
Scatter plot showing original data points (gray) and points selected using the condition
data > 4
(blue).
You often need to filter data based on several conditions simultaneously. You can combine boolean arrays using logical operators:
&
(logical AND): Selects elements where both conditions are True
.|
(logical OR): Selects elements where at least one condition is True
.~
(logical NOT): Inverts the boolean array (changes True
to False
and False
to True
).Important: When combining conditions with &
or |
, you must use parentheses ()
around each individual condition. This is because of operator precedence rules in Python and NumPy. The bitwise operators &
and |
have higher precedence than comparison operators like <
or >
. Without parentheses, Python might try to evaluate something like data > 3 & data
which leads to an error or unexpected results.
data = np.array([1, 5, 2, 8, 3, 7, 4, 6, 9, 0])
# Condition 1: data > 3
# Condition 2: data < 8
selected_and = data[(data > 3) & (data < 8)]
print(f"Elements where (data > 3) AND (data < 8): {selected_and}")
# Condition 1: data is even (data % 2 == 0)
# Condition 2: data > 5
selected_or = data[(data % 2 == 0) | (data > 5)]
print(f"Elements where (data is even) OR (data > 5): {selected_or}")
# Condition: data is NOT equal to 5
selected_not = data[~(data == 5)]
print(f"Elements where data is NOT equal to 5: {selected_not}")
Output:
Elements where (data > 3) AND (data < 8): [5 7 4 6]
Elements where (data is even) OR (data > 5): [2 8 7 4 6 9 0]
Elements where data is NOT equal to 5: [1 2 8 3 7 4 6 9 0]
Boolean indexing isn't just for selection; it's also incredibly useful for modifying array elements that meet certain criteria. You can use a boolean index on the left side of an assignment operation to change the values of all elements where the condition is True
.
# Create an array with positive and negative numbers
arr = np.array([1, -2, 3, -4, 5, -6])
print(f"Original array: {arr}")
# Replace all negative numbers with 0
arr[arr < 0] = 0
print(f"Array after replacing negatives with 0: {arr}")
# Set all elements greater than 3 to a specific value (e.g., 100)
arr[arr > 3] = 100
print(f"Array after setting elements > 3 to 100: {arr}")
Output:
Original array: [ 1 -2 3 -4 5 -6]
Array after replacing negatives with 0: [1 0 3 0 5 0]
Array after setting elements > 3 to 100: [ 1 0 3 0 100 0]
This technique is very common in data cleaning and preparation, for example, replacing invalid sensor readings or capping extreme values.
Boolean indexing provides a flexible and readable way to filter and manipulate your NumPy arrays based on the data itself, forming a significant part of efficient data analysis workflows.
© 2025 ApX Machine Learning