Matrix-Vector Multiplication

Matrix-vector multiplication is an operation that fundamentally transforms a vector. Think of a matrix as a function or a machine: you feed it a vector, and it outputs a new vector, possibly with a different length and pointing in a new direction. This operation is at the heart of many machine learning algorithms, from running a neural network layer to performing geometric transformations on data.

The Mechanics of Multiplication

To multiply a matrix $A$ by a vector $v$ , there is one important rule of compatibility: the number of columns in the matrix must equal the number of elements in the vector.

If you have an $m \times n$ matrix (m rows, n columns) and an $n \times 1$ vector (n rows, 1 column), the result will be a new $m \times 1$ vector.

Each element in the resulting vector is calculated by taking the dot product of a row from the matrix and the input vector. Let's see this with an example.

Suppose we have a $2 \times 3$ matrix $A$ and a $3 \times 1$ vector $v$ :

A = \begin{bmatrix} 3 & 1 & 4 \\ 0 & 2 & 5 \end{bmatrix}, \quad v = \begin{bmatrix} 2 \\ 6 \\ 1 \end{bmatrix}

To find the first element of our new vector, we take the dot product of the first row of $A$ and the vector $v$ : $(3 \cdot 2) + (1 \cdot 6) + (4 \cdot 1) = 6 + 6 + 4 = 16$

To find the second element, we take the dot product of the second row of $A$ and the vector $v$ : $(0 \cdot 2) + (2 \cdot 6) + (5 \cdot 1) = 0 + 12 + 5 = 17$

So, the resulting vector is:

Av = \begin{bmatrix} 16 \\ 17 \end{bmatrix}

Notice how we transformed a 3-dimensional vector into a 2-dimensional one. This ability to change the dimensionality of data is a direct consequence of matrix-vector multiplication.

A Deeper View: The Linear Combination of Columns

There's another, more insightful way to look at the same operation. The resulting vector is actually a linear combination of the columns of the matrix, where the elements of the input vector act as the weights.

Using the same matrix $A$ and vector $v$ :

A = \begin{bmatrix} 3 & 1 & 4 \\ 0 & 2 & 5 \end{bmatrix}, \quad v = \begin{bmatrix} 2 \\ 6 \\ 1 \end{bmatrix}

We can rewrite the multiplication as:

Av = 2 \begin{bmatrix} 3 \\ 0 \end{bmatrix} + 6 \begin{bmatrix} 1 \\ 2 \end{bmatrix} + 1 \begin{bmatrix} 4 \\ 5 \end{bmatrix}

Let's calculate this:

Av = \begin{bmatrix} 6 \\ 0 \end{bmatrix} + \begin{bmatrix} 6 \\ 12 \end{bmatrix} + \begin{bmatrix} 4 \\ 5 \end{bmatrix} = \begin{bmatrix} 6+6+4 \\ 0+12+5 \end{bmatrix} = \begin{bmatrix} 16 \\ 17 \end{bmatrix}

We get the exact same result. This perspective is powerful because it tells us that the matrix multiplication $Av$ is exploring a point within the space defined by the columns of $A$ . In machine learning, this often translates to combining features (the columns) according to some input weights (the vector).

Geometric Transformations

One of the most intuitive ways to understand matrix-vector multiplication is to see it as a geometric transformation. A matrix can rotate, scale, or shear a vector in space.

Let's take a simple 2D vector and a matrix that performs a "shear" transformation. A shear transformation slants the space, making squares into parallelograms.

Our vector is $v = \begin{bmatrix} 2 \\ 3 \end{bmatrix}$ and our shear matrix is $M = \begin{bmatrix} 1 & 0.5 \\ 0 & 1 \end{bmatrix}$ .

Let's compute the product $Mv$ :

Mv = \begin{bmatrix} 1 & 0.5 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 2 \\ 3 \end{bmatrix} = \begin{bmatrix} (1 \cdot 2) + (0.5 \cdot 3) \\ (0 \cdot 2) + (1 \cdot 3) \end{bmatrix} = \begin{bmatrix} 2 + 1.5 \\ 0 + 3 \end{bmatrix} = \begin{bmatrix} 3.5 \\ 3 \end{bmatrix}

The matrix $M$ has transformed our original vector, pushing its head to the right while keeping its y-coordinate the same.

The vector $v$ (blue) is transformed by matrix $M$ into the vector $Mv$ (red). The shear transformation has shifted the vector horizontally.

Implementation with NumPy

NumPy makes matrix-vector multiplication straightforward. The modern and recommended operator for this is @ (the matrix multiplication operator).

Let's perform the same calculation from our first example in Python.

import numpy as np

# Define the 2x3 matrix A
A = np.array([
    [3, 1, 4],
    [0, 2, 5]
])

# Define the 3x1 vector v
v = np.array([2, 6, 1])

# Perform matrix-vector multiplication
result = A @ v

print(f"Matrix A:\n{A}")
print(f"\nVector v:\n{v}")
print(f"\nResult of A @ v:\n{result}")
print(f"\nShape of the result: {result.shape}")

Running this code will produce the following output:

Matrix A:
[[3 1 4]
 [0 2 5]]

Vector v:
[2 6 1]

Result of A @ v:
[16 17]

Shape of the result: (2,)

This matches our manual calculation perfectly. NumPy handles the dot products for each row automatically. You might also see np.dot(A, v) used in older codebases, which achieves the same result for this operation. However, using @ often makes the code more readable, as it is designated specifically for matrix multiplication.

Was this section helpful?

References

Introduction to Linear Algebra, Gilbert Strang, 2023 (Wellesley-Cambridge Press) - A foundational textbook providing a comprehensive explanation of linear algebra, including matrix operations, linear combinations, and geometric aspects of matrix-vector products.
Mathematics for Machine Learning, Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong, 2020 (Cambridge University Press) - Offers a strong mathematical grounding for machine learning, covering linear algebra concepts such as matrix-vector multiplication and its application in data transformations.
numpy.matmul, NumPy Developers, 2024 - Official documentation for NumPy's matrix multiplication function, detailing its use, parameters, and behavior, which is crucial for practical implementation.