Okay, let's put the theory into practice. You've learned how vectors represent data and the fundamental operations we can perform on them. Now, we'll use Python's NumPy library to work with feature vectors directly. This hands-on exercise will solidify your understanding of vector manipulation, norms, dot products, and distances – operations that are performed constantly in machine learning algorithms.
Imagine we have data representing simplified profiles of two online users based on their engagement (e.g., hours spent) with three types of content: articles, videos, and podcasts.
We can represent these as vectors in R3.
First, let's import NumPy and create these vectors.
import numpy as np
# User engagement vectors (hours)
user_a = np.array([10, 5, 2])
user_b = np.array([8, 9, 3])
print(f"User A vector: {user_a}")
print(f"User B vector: {user_b}")
Let's say we want to analyze the difference in engagement between these two users. We can simply subtract one vector from the other.
# Calculate the difference vector
difference = user_a - user_b
print(f"Difference (A - B): {difference}")
The result [2, -4, -1]
shows User A spent 2 more hours on articles, 4 fewer hours on videos, and 1 fewer hour on podcasts compared to User B.
Now, suppose we want to project what User A's engagement might look like if they increased their activity by 50% across all categories. This is a scalar multiplication.
# Scale User A's engagement by 1.5 (150%)
scaled_user_a = user_a * 1.5
print(f"Scaled User A: {scaled_user_a}")
How can we quantify the "overall engagement" of each user? Vector norms give us a sense of magnitude. Let's calculate the L2 (Euclidean) and L1 (Manhattan) norms.
The L2 norm is calculated as ∣∣v∣∣2=∑ivi2.
# L2 Norm (Euclidean)
l2_norm_a = np.linalg.norm(user_a) # Default is L2 norm
l2_norm_b = np.linalg.norm(user_b)
print(f"L2 Norm (User A): {l2_norm_a:.2f}")
print(f"L2 Norm (User B): {l2_norm_b:.2f}")
The L2 norm gives a straight-line distance from the origin in the 3D feature space. It provides a general magnitude considering all components.
The L1 norm is calculated as ∣∣v∣∣1=∑i∣vi∣.
# L1 Norm (Manhattan)
l1_norm_a = np.linalg.norm(user_a, ord=1)
l1_norm_b = np.linalg.norm(user_b, ord=1)
print(f"L1 Norm (User A): {l1_norm_a:.2f}")
print(f"L1 Norm (User B): {l1_norm_b:.2f}")
The L1 norm represents the total sum of engagement hours across categories. In this context, it's perhaps more directly interpretable as total time spent. User B has a slightly higher total engagement time (20 hours) than User A (17 hours), even though User A's L2 norm is slightly higher. Different norms highlight different aspects of the vector's magnitude.
The dot product helps measure the alignment or similarity between two vectors. A higher positive dot product suggests the vectors point in more similar directions.
The dot product is A⋅B=∑iAiBi.
# Calculate the dot product
dot_product = np.dot(user_a, user_b)
print(f"Dot Product (A . B): {dot_product}")
The result (131) is positive, indicating some alignment in their engagement patterns. To get a standardized measure of similarity, independent of the magnitude (total hours), we can calculate the cosine similarity:
cos(θ)=∣∣A∣∣2∣∣B∣∣2A⋅B
# Calculate cosine similarity
cosine_similarity = dot_product / (l2_norm_a * l2_norm_b)
print(f"Cosine Similarity: {cosine_similarity:.2f}")
A cosine similarity close to 1 means the vectors point in very similar directions (proportional engagement across categories), while a value close to 0 indicates orthogonality (very different patterns), and -1 indicates opposite directions. Here, 0.91 suggests relatively similar engagement profiles, despite differences in specific categories. This metric is widely used in recommendation systems and information retrieval.
How "far apart" are these users in terms of their engagement profiles? We can use the norms to calculate distances. The most common is the Euclidean distance, which is simply the L2 norm of the difference vector we calculated earlier.
Distance(A,B)=∣∣A−B∣∣2
# Calculate Euclidean distance
euclidean_distance = np.linalg.norm(difference) # L2 norm of the difference vector
# Alternatively: np.linalg.norm(user_a - user_b)
print(f"Euclidean Distance between User A and User B: {euclidean_distance:.2f}")
This distance gives a single number representing how different the two users' engagement vectors are in the 3D space. Lower distances imply more similar users. This concept is fundamental to algorithms like k-Nearest Neighbors (k-NN).
We could also calculate the Manhattan distance (L1 norm of the difference):
# Calculate Manhattan distance
manhattan_distance = np.linalg.norm(difference, ord=1)
print(f"Manhattan Distance between User A and User B: {manhattan_distance:.2f}")
The Manhattan distance sums the absolute differences along each axis (∣2∣+∣−4∣+∣−1∣=7). It represents the distance if you could only travel along the grid lines of the feature space axes.
In this practical exercise, you used NumPy to:
These operations are the building blocks for many machine learning techniques. Being comfortable with their calculation and interpretation using tools like NumPy is an important step in applying linear algebra effectively. As you progress, you'll see these operations applied to vectors with many more dimensions, representing complex data features.
© 2025 ApX Machine Learning