Now that you have grasped the fundamentals of NumPy arrays, indexing, mathematical operations, broadcasting, and statistical functions, it's time to put these concepts into practice. Working through practical exercises is essential for solidifying your understanding and building confidence in manipulating numerical data efficiently.
This section provides hands-on problems designed to reinforce the techniques covered in this chapter. We'll work through examples that involve creating arrays, selecting data, performing calculations, and applying statistical methods, skills directly applicable to data preparation and analysis in machine learning.
Imagine you have collected temperature readings (in Celsius) from three different sensors over four consecutive time points. The readings are: Sensor 1: [22.5, 23.1, 22.8, 23.5], Sensor 2: [21.8, 22.2, 22.0, 22.5], Sensor 3: [23.0, 23.3, 23.1, 23.6].
Task:
Solution:
import numpy as np
# 1. Create the 2D array
sensor_data = np.array([
[22.5, 23.1, 22.8, 23.5],
[21.8, 22.2, 22.0, 22.5],
[23.0, 23.3, 23.1, 23.6]
])
print("Sensor Data Array:")
print(sensor_data)
print("-" * 20)
# 2. Average temperature per sensor (across columns, axis=1)
avg_per_sensor = np.mean(sensor_data, axis=1)
print("Average Temperature per Sensor:")
print(avg_per_sensor)
print("-" * 20)
# 3. Average temperature per time point (across rows, axis=0)
avg_per_timepoint = np.mean(sensor_data, axis=0)
print("Average Temperature per Time Point:")
print(avg_per_timepoint)
print("-" * 20)
# 4. Overall maximum temperature
max_temp = np.max(sensor_data)
print(f"Overall Maximum Temperature: {max_temp:.1f}°C")
Explanation:
sensor_data
array using np.array()
with a list of lists.np.mean()
and specify axis=1
. This tells NumPy to compute the mean along the horizontal axis (across the columns for each row).axis=0
computes the mean along the vertical axis (across the rows for each column), giving the average temperature at each time point.np.max()
without an axis argument finds the maximum value in the entire array.Standard scaling (or Z-score normalization) is a common preprocessing step in machine learning. It involves transforming data such that it has a mean of 0 and a standard deviation of 1. The formula for a data point x is:
Z=σx−μWhere μ is the mean of the data and σ is the standard deviation.
Task:
[10, 15, 12, 18, 25, 11, 16]
.Solution:
import numpy as np
# 1. Create the array
data = np.array([10, 15, 12, 18, 25, 11, 16])
print("Original Data:")
print(data)
print("-" * 20)
# 2. Calculate mean and standard deviation
mean_val = np.mean(data)
std_dev = np.std(data)
print(f"Mean (μ): {mean_val:.2f}")
print(f"Standard Deviation (σ): {std_dev:.2f}")
print("-" * 20)
# 3. Apply standard scaling using broadcasting
normalized_data = (data - mean_val) / std_dev
print("Normalized Data (Z-scores):")
print(normalized_data)
print("-" * 20)
# Verification: Check mean and std dev of normalized data
print(f"Mean of Normalized Data: {np.mean(normalized_data):.2f}")
print(f"Std Dev of Normalized Data: {np.std(normalized_data):.2f}")
Explanation:
np.mean()
and np.std()
.normalized_data = (data - mean_val) / std_dev
. Here, mean_val
(a scalar) is subtracted from every element of the data
array (broadcasting). The result of this subtraction (an array) is then divided element-wise by std_dev
(another scalar, again using broadcasting). This efficiently applies the Z-score formula to the entire array without needing explicit loops.normalized_data
has a mean very close to 0 and a standard deviation very close to 1, as expected.Consider a dataset representing scores of students on two different tests: scores = np.array([[85, 92], [78, 81], [91, 95], [60, 65], [72, 79]])
. Each row is a student, column 0 is Test 1 score, and column 1 is Test 2 score.
Task:
scores
array.Solution:
import numpy as np
scores = np.array([[85, 92], [78, 81], [91, 95], [60, 65], [72, 79]])
print("Original Scores:")
print(scores)
print("-" * 20)
# 1. Select students with Test 1 score >= 80
high_scorers_test1 = scores[scores[:, 0] >= 80]
print("Scores of Students with Test 1 >= 80:")
print(high_scorers_test1)
print("-" * 20)
# 2. Identify students scoring < 70 on either test
low_scorers_mask = (scores[:, 0] < 70) | (scores[:, 1] < 70)
low_scorers = scores[low_scorers_mask]
print("Scores of Students with < 70 on Either Test:")
print(low_scorers)
print("-" * 20)
# 3. Apply curve to Test 2 for scores < 80
# Create a copy to avoid modifying the original array
curved_scores = scores.copy()
# Create a boolean mask for Test 2 scores < 80
test2_curve_mask = curved_scores[:, 1] < 80
# Add 3 points using the mask for selection on the relevant column
curved_scores[test2_curve_mask, 1] += 3
print("Scores after applying curve to Test 2 (< 80):")
print(curved_scores)
print("-" * 20)
print("Original Scores (unchanged):")
print(scores)
Explanation:
scores[:, 0]
selects all rows (:
) and the first column (0
), which corresponds to Test 1 scores. The condition >= 80
creates a boolean array ([True, False, True, False, False]
). Using this boolean array as an index for scores
selects only the rows where the condition is True
.scores[:, 0] < 70
for Test 1 and scores[:, 1] < 70
for Test 2. The logical OR operator (|
) combines these, resulting in a mask that is True
if a student scored below 70 on at least one test. This mask is then used to select the relevant rows.scores.copy()
to create curved_scores
. Otherwise, modifications would affect the original scores
array. We create a mask test2_curve_mask
specifically for Test 2 scores below 80. Then, curved_scores[test2_curve_mask, 1]
selects the rows indicated by the mask but only in the second column (Test 2 scores). We then use += 3
to add 3 directly to these selected elements.You are given a 1D array representing pixel values from a grayscale image snippet: pixels = np.arange(1, 13)
.
Task:
transform_matrix
with values [[0.5, 0.5], [1, 0], [0, 1], [0.2, 0.8]]
.transform_matrix
. What is the shape of the resulting matrix?Solution:
import numpy as np
pixels = np.arange(1, 13)
print("Original Pixel Array:")
print(pixels)
print("-" * 20)
# 1. Reshape the array
pixel_matrix = pixels.reshape((3, 4))
print("Reshaped Pixel Matrix (3x4):")
print(pixel_matrix)
print("-" * 20)
# 2. Create the transformation matrix
transform_matrix = np.array([
[0.5, 0.5],
[1, 0],
[0, 1],
[0.2, 0.8]
])
print("Transformation Matrix (4x2):")
print(transform_matrix)
print("-" * 20)
# 3. Perform matrix multiplication
result_matrix = np.dot(pixel_matrix, transform_matrix)
# Alternative syntax: result_matrix = pixel_matrix @ transform_matrix
print("Result of Matrix Multiplication (pixel_matrix @ transform_matrix):")
print(result_matrix)
print("-" * 20)
print(f"Shape of the Resulting Matrix: {result_matrix.shape}")
Explanation:
np.arange(1, 13)
creates an array [1, 2, ..., 12]
.reshape((3, 4))
method reorganizes these 12 elements into a matrix with 3 rows and 4 columns. Note that the total number of elements must remain the same (3 * 4 = 12).np.dot()
(or the @
operator) for matrix multiplication. Standard multiplication (*
) would perform element-wise multiplication if shapes were compatible via broadcasting, which is not what we want here.These exercises demonstrate how the different NumPy functionalities you've learned, creation, indexing, slicing, broadcasting, mathematical operations, and basic linear algebra, work together in practical scenarios. As you proceed to work with libraries like Pandas and Scikit-learn, you'll find that a solid grasp of these NumPy manipulations is indispensable. Continue experimenting with different array shapes, operations, and indexing techniques to build fluency.
© 2025 ApX Machine Learning