In machine learning, we often need to quantify the "size" or "magnitude" of a vector. While we intuitively understand length in 2D or 3D space, vectors representing data features can exist in much higher dimensions. We need a consistent way to measure this length. This is where the concept of a norm comes in. A norm is a function that takes a vector as input and returns a non-negative scalar value representing its magnitude. Think of it as a formal generalization of the concept of length.
Different norms measure length in different ways, and the choice of norm can have significant implications in machine learning algorithms, particularly in areas like regularization and error calculation. Let's explore the most common norms you'll encounter.
The most familiar norm is the L2 norm, also known as the Euclidean norm. It corresponds to the standard geometric length of a vector, calculated as the square root of the sum of the squares of its components. For an n-dimensional vector v=[v1,v2,...,vn], the L2 norm is defined as:
∣∣v∣∣2=v12+v22+...+vn2=∑i=1nvi2
This is essentially the Pythagorean theorem generalized to higher dimensions. It represents the shortest straight-line distance from the origin to the point represented by the vector.
Example: Consider the vector v=[3,4]. Its L2 norm is: ∣∣v∣∣2=32+42=9+16=25=5
import numpy as np
v = np.array([3, 4])
l2_norm = np.linalg.norm(v) # Default norm is L2
print(f"Vector: {v}")
print(f"L2 Norm: {l2_norm}")
# Output:
# Vector: [3 4]
# L2 Norm: 5.0
The L2 norm measures the direct Euclidean distance from the origin (0,0) to the point (3,4).
In machine learning, the L2 norm is frequently used:
Another important norm is the L1 norm, often called the Manhattan norm or Taxicab norm. Instead of squaring components, it sums their absolute values:
∣∣v∣∣1=∣v1∣+∣v2∣+...+∣vn∣=∑i=1n∣vi∣
The name "Manhattan norm" comes from the idea of navigating a grid-like city plan. You can only travel along the grid lines (like city blocks), not diagonally. The L1 norm represents the total distance traveled along these axes to get from the origin to the vector's endpoint.
Example: For the same vector v=[3,4]. Its L1 norm is: ∣∣v∣∣1=∣3∣+∣4∣=3+4=7
import numpy as np
v = np.array([3, 4])
l1_norm = np.linalg.norm(v, ord=1)
print(f"Vector: {v}")
print(f"L1 Norm: {l1_norm}")
# Output:
# Vector: [3 4]
# L1 Norm: 7.0
The L1 norm measures the sum of the absolute values of the components, like moving along grid lines from (0,0) to (3,0) and then to (3,4).
The L1 norm has distinct properties useful in machine learning:
The key difference lies in how they treat component magnitudes. The L2 norm squares values, heavily penalizing large components. The L1 norm takes absolute values, treating deviations linearly.
This difference is visually apparent when considering all vectors with a norm equal to 1. For the L2 norm, this forms a circle (in 2D) or a hypersphere (in higher dimensions). For the L1 norm, it forms a diamond (in 2D) or a cross-polytope (in higher dimensions).
Geometric shapes representing vectors where ∣∣v∣∣2=1 (blue circle) and ∣∣v∣∣1=1 (red diamond) in two dimensions. The "corners" of the L1 diamond correspond to the axes, contributing to its tendency to favor sparse solutions (where some components are exactly zero) in optimization problems.
The L1 and L2 norms are specific cases of a broader family called Lp norms, defined as:
∣∣v∣∣p=(∑i=1n∣vi∣p)1/p
where p≥1.
Another norm sometimes encountered is the L∞ norm (or max norm), which is the limit of the Lp norm as p approaches infinity. It simply returns the maximum absolute value among the vector's components:
∣∣v∣∣∞=maxi∣vi∣
While L1 and L2 are the workhorses in many machine learning applications, being aware of the general Lp family provides broader context.
Understanding vector norms is fundamental because they provide ways to measure:
Being able to calculate these norms efficiently, often using libraries like NumPy, is a core skill for implementing and understanding many machine learning algorithms.
© 2025 ApX Machine Learning