In the previous section, we explored matrix multiplication purely as a computational procedure. Now, let's look at its geometric meaning. Multiplying a matrix by a vector isn't just symbol manipulation; it's a way to transform that vector (or the point it represents) in space. This concept of transformation is fundamental in machine learning, used for tasks ranging from changing coordinate systems to manipulating data representations.
Think of a matrix A as an operator or function that takes an input vector x and produces an output vector y through multiplication: y=Ax. This operation is called a linear transformation.
What makes a transformation "linear"? It must satisfy two properties for any vectors u,v and any scalar c:
Matrix multiplication inherently satisfies these conditions. Geometrically, linear transformations preserve the origin (the zero vector maps to the zero vector) and keep grid lines parallel and evenly spaced, although the spacing and orientation might change. They can stretch, shrink, rotate, reflect, or shear space, but they don't curve it.
Let's examine some common linear transformations represented by 2x2 matrices acting on 2D vectors. We'll visualize how they affect the vertices of a unit square: (0,0),(1,0),(1,1),(0,1).
A scaling transformation stretches or shrinks space along the coordinate axes. It's represented by a diagonal matrix.
Consider the matrix S=[2000.5]. Let's apply it to a vector x=[x1x2]:
Sx=[2000.5][x1x2]=[2x10.5x2]
This transformation doubles the x1 component and halves the x2 component of any vector. Applying this to our unit square vertices:
Scaling transformation [[2, 0], [0, 0.5]] applied to the unit square. The square is stretched horizontally and compressed vertically.
A rotation transformation rotates vectors around the origin by a specific angle θ. The standard 2D rotation matrix is:
Rθ=[cosθsinθ−sinθcosθ]
Let's rotate by θ=90∘ (cos90∘=0,sin90∘=1). The matrix becomes R90=[01−10].
Applying this to our unit square vertices:
Rotation transformation [[0, -1], [1, 0]] applied to the unit square, rotating it 90 degrees counter-clockwise.
A shear transformation slants the space, effectively sliding layers relative to each other. A horizontal shear is often represented by a matrix like H=[10s1], where s is the shear factor.
Let's use s=1, so H=[1011]. Applying this to a vector x=[x1x2]:
Hx=[1011][x1x2]=[x1+x2x2]
The transformation adds the x2 coordinate to the x1 coordinate, shifting points horizontally based on their height. Applying this to our unit square vertices:
Horizontal shear transformation [[1, 1], [0, 1]] applied to the unit square. The top edge is shifted right.
What happens if we apply one transformation followed by another? For example, first rotate by 90 degrees (R90) and then scale (S)? We apply the transformations sequentially:
Because matrix multiplication is associative, this is equivalent to y=(SR90)x. The combined transformation is represented by the product of the individual transformation matrices.
Important: Matrix multiplication is generally not commutative (AB=BA). This means the order in which you apply transformations matters. Scaling then rotating typically yields a different result than rotating then scaling. The matrix corresponding to the transformation applied first appears on the right side of the product.
Understanding matrices as linear transformations is significant in ML:
In the next sections and chapters, we'll see how these geometric interpretations connect to solving linear systems, understanding vector spaces, and performing powerful matrix decompositions used throughout machine learning. Remember that NumPy provides efficient tools (like the @
operator or np.dot()
) to perform these matrix multiplications, effectively applying the transformations we've discussed.
© 2025 ApX Machine Learning