Okay, let's put the concepts from this chapter into practice. We'll work through examples applying arithmetic operations, universal functions, statistical calculations, logical operations, and broadcasting to NumPy arrays. Make sure you have NumPy imported, typically using import numpy as np
.
First, let's create a couple of arrays to work with. We'll use simple arrays so it's easy to follow the calculations.
import numpy as np
# Array representing daily temperatures (degrees Celsius) for two locations over 5 days
temps_loc_a = np.array([15.0, 17.5, 18.0, 16.5, 19.0])
temps_loc_b = np.array([12.0, 14.0, 13.5, 15.0, 16.0])
# A 2D array representing sensor readings (e.g., voltage) from 3 sensors over 4 time points
sensor_readings = np.array([[1.1, 1.2, 1.0, 1.3],
[2.0, 2.2, 2.1, 1.9],
[0.8, 0.7, 0.9, 0.8]])
print("Temperatures Location A:", temps_loc_a)
print("Temperatures Location B:", temps_loc_b)
print("Sensor Readings:\n", sensor_readings)
Let's perform some basic calculations.
a) Element-wise Addition: Find the average temperature across both locations for each day.
# Calculate the sum of temperatures for each day
daily_sum = temps_loc_a + temps_loc_b
# Calculate the average temperature for each day
daily_avg = daily_sum / 2.0
# Or more directly: daily_avg = (temps_loc_a + temps_loc_b) / 2
print("Daily Sum:", daily_sum)
print("Daily Average Temp:", daily_avg)
Notice how the addition +
and division /
operations were applied element by element.
b) Temperature Conversion: Convert temperatures for Location A from Celsius to Fahrenheit. The formula is F=(C×9/5)+32.
temps_fahrenheit_a = (temps_loc_a * 9/5) + 32
print("Temperatures Location A (Fahrenheit):", temps_fahrenheit_a)
Again, multiplication *
and addition +
are applied to each element automatically.
c) Using a ufunc: Calculate the square root of each sensor reading (perhaps for a transformation). We use np.sqrt()
.
sqrt_readings = np.sqrt(sensor_readings)
print("Square Root of Sensor Readings:\n", sqrt_readings)
NumPy's np.sqrt
function operates efficiently on the entire array.
Now let's calculate some summary statistics.
a) Overall Statistics: Find the overall minimum, maximum, and average temperature recorded at Location A.
min_temp_a = np.min(temps_loc_a)
max_temp_a = np.max(temps_loc_a)
avg_temp_a = np.mean(temps_loc_a)
std_dev_temp_a = np.std(temps_loc_a) # Standard Deviation
print(f"Location A - Min Temp: {min_temp_a:.2f} C")
print(f"Location A - Max Temp: {max_temp_a:.2f} C")
print(f"Location A - Avg Temp: {avg_temp_a:.2f} C")
print(f"Location A - Std Dev: {std_dev_temp_a:.2f} C")
b) Statistics Along Axes: Calculate the average reading for each sensor (across time) and the average reading at each time point (across sensors) from our sensor_readings
array.
Remember:
axis=0
operates along the rows (calculates statistic for each column).axis=1
operates along the columns (calculates statistic for each row).# Average reading per time point (across sensors) - collapses rows
avg_reading_per_timepoint = np.mean(sensor_readings, axis=0)
# Average reading per sensor (across time) - collapses columns
avg_reading_per_sensor = np.mean(sensor_readings, axis=1)
print("Average Reading per Time Point:", avg_reading_per_timepoint)
print("Average Reading per Sensor:", avg_reading_per_sensor)
We can visualize the average reading per sensor using a simple bar chart.
Average voltage reading calculated for each sensor across all time points.
Let's identify days where the temperature at Location A was above 17 degrees Celsius.
# Create a boolean array based on the condition
hot_days_a = temps_loc_a > 17.0
print("Days Temp > 17 C at Loc A:", hot_days_a)
# Use boolean indexing to select the temperatures on those days
hot_temps_a = temps_loc_a[hot_days_a]
# You can also do this in one step: hot_temps_a = temps_loc_a[temps_loc_a > 17.0]
print("Temperatures on hot days at Loc A:", hot_temps_a)
# How many hot days were there?
num_hot_days = np.sum(hot_days_a) # True counts as 1, False as 0
print("Number of days Temp > 17 C:", num_hot_days)
The comparison temps_loc_a > 17.0
returns a boolean array ([False, True, True, False, True]
). This array is then used as an index to select only the elements from temps_loc_a
where the corresponding value in the boolean array is True
.
Broadcasting allows operations between arrays of different shapes if they are compatible.
a) Adding a Scalar: Add a constant adjustment (e.g., 0.5) to all sensor readings.
adjusted_readings = sensor_readings + 0.5
print("Adjusted Sensor Readings:\n", adjusted_readings)
Here, the scalar 0.5
is effectively "stretched" or broadcast to match the shape of sensor_readings
before the addition occurs.
b) Operating with a 1D Array: Let's say we have a baseline value for each time point (perhaps an average from previous days) and want to see the difference from the current readings for Sensor 1.
baseline_per_timepoint = np.array([1.0, 1.1, 0.9, 1.2])
sensor1_readings = sensor_readings[0, :] # First row
difference_sensor1 = sensor1_readings - baseline_per_timepoint
print("Sensor 1 Readings:", sensor1_readings)
print("Baseline per Time Point:", baseline_per_timepoint)
print("Difference for Sensor 1:", difference_sensor1)
Both sensor1_readings
and baseline_per_timepoint
are 1D arrays of the same size (4 elements), so element-wise subtraction works directly.
c) Broadcasting a 1D array to a 2D array: Now, let's subtract the baseline_per_timepoint
from all sensor readings.
difference_all_sensors = sensor_readings - baseline_per_timepoint
print("Difference from Baseline for All Sensors:\n", difference_all_sensors)
How did this work? sensor_readings
has shape (3, 4)
and baseline_per_timepoint
has shape (4,)
. NumPy's broadcasting rules compare dimensions from right to left:
sensor_readings
is 3, but baseline_per_timepoint
has no more dimensions.
NumPy "stretches" or duplicates baseline_per_timepoint
along the missing dimension (rows in this case) to effectively create a (3, 4)
array matching sensor_readings
. The subtraction then happens element-wise. It's as if NumPy performed the subtraction like this:[[1.1, 1.2, 1.0, 1.3], [[1.0, 1.1, 0.9, 1.2],
[2.0, 2.2, 2.1, 1.9], - [1.0, 1.1, 0.9, 1.2],
[0.8, 0.7, 0.9, 0.8]] [1.0, 1.1, 0.9, 1.2]]
In this hands-on session, you applied the fundamental numerical operations covered in this chapter:
np.sqrt
.np.mean
, np.min
, np.max
, np.std
) for entire arrays and along specific axes.>
) to create boolean arrays for filtering data (boolean indexing).These operations are the workhorses of numerical computing with NumPy and form the basis for many data analysis tasks you'll encounter. Experiment further by creating your own arrays and trying different functions and operations.
© 2025 ApX Machine Learning