Once you have created a NumPy array, the next fundamental operation is accessing its elements or subsets of its elements. Similar to Python lists, NumPy arrays use zero-based indexing and support slicing, but NumPy extends these concepts to multiple dimensions and introduces more powerful indexing techniques crucial for data manipulation in machine learning.
Accessing elements in a one-dimensional NumPy array works much like accessing elements in a Python list. You use square brackets []
with the integer index of the element you want. Remember that indexing starts at 0.
import numpy as np
# Create a 1D array
arr1d = np.arange(10) # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# Access the first element
print(arr1d[0])
# Output: 0
# Access the fifth element
print(arr1d[4])
# Output: 4
# Access the last element
print(arr1d[-1])
# Output: 9
You can also modify elements using indexing:
# Modify the first element
arr1d[0] = 100
print(arr1d)
# Output: array([100, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Slicing allows you to select a range of elements. The syntax is start:stop:step
, where start
is the index of the first element to include (inclusive), stop
is the index of the first element not to include (exclusive), and step
defines the jump between elements.
# Slice elements from index 2 up to (but not including) index 5
print(arr1d[2:5])
# Output: array([2, 3, 4])
# Slice from the beginning up to index 4
print(arr1d[:4])
# Output: array([100, 1, 2, 3])
# Slice from index 5 to the end
print(arr1d[5:])
# Output: array([5, 6, 7, 8, 9])
# Slice every second element
print(arr1d[::2])
# Output: array([100, 2, 4, 6, 8])
# Slice elements from index 1 to 7 with a step of 3
print(arr1d[1:8:3])
# Output: array([1, 4, 7])
A significant difference between NumPy array slicing and Python list slicing is that NumPy array slices are views into the original array data. This means they share the same underlying data buffer. Modifying a slice will modify the original array. This behavior is designed for performance, avoiding unnecessary data copying, especially with large datasets common in machine learning.
arr_original = np.arange(5) # array([0, 1, 2, 3, 4])
arr_slice = arr_original[1:4] # Get a slice (view)
print(f"Original array before slice modification: {arr_original}")
print(f"Slice before modification: {arr_slice}")
# Modify the first element of the slice
arr_slice[0] = 99
print(f"Slice after modification: {arr_slice}")
print(f"Original array after slice modification: {arr_original}")
# Output:
# Original array before slice modification: [0 1 2 3 4]
# Slice before modification: [1 2 3]
# Slice after modification: [99 2 3]
# Original array after slice modification: [ 0 99 2 3 4]
Notice how changing arr_slice[0]
also changed arr_original[1]
. If you need a distinct copy of a slice, use the copy()
method:
arr_original = np.arange(5)
arr_slice_copy = arr_original[1:4].copy() # Create an explicit copy
arr_slice_copy[0] = 99 # Modify the copy
print(f"Slice copy: {arr_slice_copy}")
print(f"Original array: {arr_original}") # Original remains unchanged
# Output:
# Slice copy: [99 2 3]
# Original array: [0 1 2 3 4]
Understanding the view versus copy behavior is important for preventing unintended side effects in your data manipulation code.
NumPy extends indexing and slicing naturally to arrays with more than one dimension. You provide indices or slices for each dimension, separated by commas within the square brackets.
Consider a 2D array (a matrix):
arr2d = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
To access a single element, provide the row index followed by the column index.
# Access element at row 1, column 2 (value 6)
element = arr2d[1, 2]
print(element)
# Output: 6
# Alternative syntax (less common): arr2d[1][2]
# This works but is less efficient as it creates an intermediate 1D array
You can slice along each dimension:
# Get the first row (row index 0, all columns)
first_row = arr2d[0, :] # or simply arr2d[0]
print(first_row)
# Output: array([1, 2, 3])
# Get the second column (all rows, column index 1)
second_col = arr2d[:, 1]
print(second_col)
# Output: array([2, 5, 8])
# Get a sub-array: rows 0 and 1, columns 1 and 2
sub_array = arr2d[0:2, 1:3]
print(sub_array)
# Output:
# [[2 3]
# [5 6]]
# Get the bottom-right 2x2 matrix
bottom_right = arr2d[1:, 1:]
print(bottom_right)
# Output:
# [[5 6]
# [8 9]]
Like 1D slices, multi-dimensional slices are also views into the original array. Modifying the slice will affect the original arr2d
.
# Modify the sub_array slice
sub_array[0, 0] = 99 # Changes arr2d[0, 1]
print(sub_array)
# Output:
# [[99 3]
# [ 5 6]]
print(arr2d) # Original array is modified
# Output:
# [[ 1 99 3]
# [ 4 5 6]
# [ 7 8 9]]
Use arr2d[0:2, 1:3].copy()
if you need an independent copy of the sub-array.
Boolean indexing, often called masking, is a powerful technique where you use a boolean array (containing True
or False
values) to select elements from another array. The boolean array must have the same shape as the dimension(s) it's indexing.
Typically, you create the boolean array based on a condition applied to the array itself.
data = np.array([1, -2, 3, -4, 5, -6])
# Create a boolean mask for positive values
positive_mask = data > 0 # array([ True, False, True, False, True, False])
print(positive_mask)
# Use the mask to select only positive elements
positive_values = data[positive_mask]
print(positive_values)
# Output: array([1, 3, 5])
You can apply the condition directly within the brackets:
negative_values = data[data < 0]
print(negative_values)
# Output: array([-2, -4, -6])
Boolean indexing can be combined with assignment to modify elements that meet a condition:
# Set all negative values to 0
data[data < 0] = 0
print(data)
# Output: array([1, 0, 3, 0, 5, 0])
You can combine multiple conditions using NumPy's bitwise logical operators: &
(AND), |
(OR), and ~
(NOT). Important: Use these bitwise operators, not the Python keywords and
, or
, not
, because NumPy needs to perform element-wise comparisons. Parentheses are often needed due to operator precedence.
arr = np.arange(12).reshape((3, 4))
print("Original array:\n", arr)
# Output:
# Original array:
# [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
# Select elements greater than 3 AND less than 9
mask = (arr > 3) & (arr < 9)
print("Mask ( > 3 & < 9):\n", mask)
# Output:
# Mask ( > 3 & < 9):
# [[False False False False]
# [ True True True True]
# [ True False False False]]
print("Selected elements:", arr[mask])
# Output: Selected elements: [4 5 6 7 8]
# Select elements less than 2 OR greater than 9
print("Selected elements (< 2 | > 9):", arr[(arr < 2) | (arr > 9)])
# Output: Selected elements (< 2 | > 9): [ 0 1 10 11]
# Select elements NOT greater than 5
print("Selected elements (~( > 5)):", arr[~(arr > 5)])
# Output: Selected elements (~( > 5)): [0 1 2 3 4 5]
Unlike basic slicing, boolean indexing always creates a copy of the data, not a view. Modifying the result of a boolean indexing operation will not affect the original array.
arr_original = np.arange(5)
bool_selection = arr_original[arr_original > 2] # Elements 3, 4
print(f"Original: {arr_original}")
print(f"Selection: {bool_selection}")
bool_selection[0] = 99 # Modify the selection
print(f"Selection after modification: {bool_selection}")
print(f"Original after modification: {arr_original}") # Original is unchanged
# Output:
# Original: [0 1 2 3 4]
# Selection: [3 4]
# Selection after modification: [99 4]
# Original after modification: [0 1 2 3 4]
Fancy indexing allows you to use arrays of integers to specify the indices you want to select. This provides immense flexibility for selecting arbitrary elements or reordering elements.
Pass a list or NumPy array of indices to select specific elements in the desired order.
arr1d = np.array([10, 20, 30, 40, 50, 60])
# Select elements at indices 1, 3, and 4
indices = [1, 3, 4]
selected = arr1d[indices]
print(selected)
# Output: array([20, 40, 50])
# Select elements in a different order, including duplicates
indices = np.array([5, 0, 5, 2])
selected_ordered = arr1d[indices]
print(selected_ordered)
# Output: array([60, 10, 60, 30])
You can also use fancy indexing to modify specific elements:
arr1d[[0, 2]] = -99 # Modify elements at index 0 and 2
print(arr1d)
# Output: array([-99, 20, -99, 40, 50, 60])
Fancy indexing works in multiple dimensions as well. You can provide arrays of indices for each dimension.
arr2d = np.arange(12).reshape((3, 4))
# [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
# Select specific rows (e.g., row 0 and row 2)
selected_rows = arr2d[[0, 2]]
print("Selected rows:\n", selected_rows)
# Output:
# Selected rows:
# [[ 0 1 2 3]
# [ 8 9 10 11]]
# Select specific columns (e.g., column 1 and column 3)
selected_cols = arr2d[:, [1, 3]]
print("Selected columns:\n", selected_cols)
# Output:
# Selected columns:
# [[ 1 3]
# [ 5 7]
# [ 9 11]]
A common use case is selecting specific elements using pairs of row and column indices. If you provide two index arrays rows
and cols
, NumPy pairs them up: (rows[0], cols[0])
, (rows[1], cols[1])
, etc. The shape of the resulting array matches the shape of the index arrays.
# Select elements at coordinates (0, 1), (1, 2), and (2, 0)
rows = np.array([0, 1, 2])
cols = np.array([1, 2, 0])
selected_elements = arr2d[rows, cols]
print("Selected elements at (0,1), (1,2), (2,0):", selected_elements)
# Output: Selected elements at (0,1), (1,2), (2,0): [1 6 8]
Similar to boolean indexing, fancy indexing generally creates copies of the data, not views.
arr_original = np.arange(10)
fancy_selection = arr_original[[1, 5, 8]]
print(f"Original: {arr_original}")
print(f"Selection: {fancy_selection}")
fancy_selection[0] = 99 # Modify the selection
print(f"Selection after modification: {fancy_selection}")
print(f"Original after modification: {arr_original}") # Original is unchanged
# Output:
# Original: [0 1 2 3 4 5 6 7 8 9]
# Selection: [1 5 8]
# Selection after modification: [99 5 8]
# Original after modification: [0 1 2 3 4 5 6 7 8 9]
You can combine basic slicing, boolean indexing, and fancy indexing for more complex selections. Remember the rules about views vs. copies based on the predominant indexing type used. Generally, if fancy or boolean indexing is involved, you'll get a copy.
arr2d = np.arange(12).reshape((3, 4))
# [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
# Select rows 0 and 2, and columns 1 to 3 (exclusive)
# Fancy index for rows, slice for columns
subset1 = arr2d[[0, 2], 1:3]
print("Rows [0, 2], Columns 1:3:\n", subset1)
# Output:
# Rows [0, 2], Columns 1:3:
# [[ 1 2]
# [ 9 10]]
# Select from row 1, columns where the element in that row is > 5
# Basic index for row, boolean index for columns
subset2 = arr2d[1, arr2d[1] > 5]
print("Row 1, elements > 5:", subset2)
# Output: Row 1, elements > 5: [6 7]
Mastering these indexing and slicing techniques is fundamental for selecting, filtering, and modifying the specific parts of your data required for analysis and preparing data for machine learning models. The distinction between views and copies is particularly important to grasp to avoid unexpected behavior and manage memory efficiently.
© 2025 ApX Machine Learning