Matrix Multiplication: The Dot Product

Matrix multiplication is a fundamental operation that combines two matrices. Unlike simpler element-by-element operations, matrix multiplication relies on a specific rule involving the rows of the first matrix and the columns of the second, rather than just corresponding entries. This operation is central to many concepts in machine learning, such as transforming data points, chaining computational steps in neural networks, and expressing systems of linear equations.

Understanding the Dimensions Rule

Before you can multiply two matrices, say $A$ and $B$ , they must satisfy a specific condition regarding their dimensions. If matrix $A$ has dimensions $m \times n$ (meaning $m$ rows and $n$ columns) and matrix $B$ has dimensions $n \times p$ ( $n$ rows and $p$ columns), then the matrix product $AB$ is defined.

The critical part is that the number of columns in the first matrix ( $A$ ) must equal the number of rows in the second matrix ( $B$ ). In this case, both are $n$ .

The resulting matrix, let's call it $C = AB$ , will have dimensions $m \times p$ . It will have the same number of rows as the first matrix ( $A$ ) and the same number of columns as the second matrix ( $B$ ).

$\underbrace{A}_{m \times n} \quad \times \quad \underbrace{B}_{n \times p} \quad = \quad \underbrace{C}_{m \times p}$

If the inner dimensions don't match ( $n \neq n$ ), the matrices cannot be multiplied in that order.

Calculating the Product: Row-by-Column Dot Products

How do we find the values inside the resulting matrix $C$ ? Each element $C_{ij}$ (the element in the $i$ -th row and $j$ -th column of $C$ ) is calculated by taking the dot product of the $i$ -th row of matrix $A$ and the $j$ -th column of matrix $B$ .

Remember the dot product of two vectors $u = [u_1, u_2, \dots, u_n]$ and $v = [v_1, v_2, \dots, v_n]$ is $u \cdot v = u_1v_1 + u_2v_2 + \dots + u_nv_n = \sum_{k=1}^{n} u_k v_k$ .

For matrix multiplication $C = AB$ , where $A$ is $m \times n$ and $B$ is $n \times p$ :

$C_{ij} = (\text{Row } i \text{ of } A) \cdot (\text{Column } j \text{ of } B)$

Mathematically, if $A_{ik}$ is the element in the $i$ -th row and $k$ -th column of $A$ , and $B_{kj}$ is the element in the $k$ -th row and $j$ -th column of $B$ , then:

$C_{ij} = \sum_{k=1}^{n} A_{ik} B_{kj}$

This means you multiply corresponding elements from the row of $A$ and the column of $B$ and then sum up those products.

A visual representation of how Row $i$ of matrix $A$ and Column $j$ of matrix $B$ combine via the dot product to compute the element $C_{ij}$ in the resulting matrix $C$ .

Numerical Example

Let's multiply a $2 \times 3$ matrix $A$ by a $3 \times 2$ matrix $B$ . The result $C$ should be a $2 \times 2$ matrix.

A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \quad B = \begin{bmatrix} 7 & 8 \\ 9 & 1 \\ 2 & 3 \end{bmatrix}

The resulting matrix is $C = AB$ :

C = \begin{bmatrix} C_{11} & C_{12} \\ C_{21} & C_{22} \end{bmatrix}

Let's calculate each element:

$C_{11}$ : Dot product of Row 1 of $A$ and Column 1 of $B$ . $C_{11} = (1 \times 7) + (2 \times 9) + (3 \times 2) = 7 + 18 + 6 = 31$
$C_{12}$ : Dot product of Row 1 of $A$ and Column 2 of $B$ . $C_{12} = (1 \times 8) + (2 \times 1) + (3 \times 3) = 8 + 2 + 9 = 19$
$C_{21}$ : Dot product of Row 2 of $A$ and Column 1 of $B$ . $C_{21} = (4 \times 7) + (5 \times 9) + (6 \times 2) = 28 + 45 + 12 = 85$
$C_{22}$ : Dot product of Row 2 of $A$ and Column 2 of $B$ . $C_{22} = (4 \times 8) + (5 \times 1) + (6 \times 3) = 32 + 5 + 18 = 55$

So, the resulting matrix is:

C = AB = \begin{bmatrix} 31 & 19 \\ 85 & 55 \end{bmatrix}

Matrix Multiplication in NumPy

NumPy makes matrix multiplication straightforward. The standard way to perform matrix multiplication between two NumPy arrays (representing matrices) since Python 3.5 is using the @ operator.

Let's perform the same calculation as above using NumPy:

import numpy as np

# Define matrices A and B
A = np.array([[1, 2, 3],
              [4, 5, 6]])

B = np.array([[7, 8],
              [9, 1],
              [2, 3]])

# Check shapes
print(f"Shape of A: {A.shape}") # Output: Shape of A: (2, 3)
print(f"Shape of B: {B.shape}") # Output: Shape of B: (3, 2)

# Perform matrix multiplication using the @ operator
C = A @ B

print(f"\nMatrix A:\n{A}")
print(f"Matrix B:\n{B}")
print(f"Result C = A @ B:\n{C}")
# Output:
# Result C = A @ B:
# [[31 19]
#  [85 55]]

print(f"Shape of C: {C.shape}") # Output: Shape of C: (2, 2)

The result matches our manual calculation. Notice how NumPy handles the row-by-column dot products internally.

You might also encounter np.dot(A, B) or A.dot(B). For 2D arrays (matrices), these functions perform standard matrix multiplication, just like the @ operator. However, the @ operator is generally preferred for matrix multiplication because it's unambiguous, whereas np.dot behaves differently for arrays with more than two dimensions. For clarity when working with matrices, stick with @.

An Important Note: Non-Commutativity

Unlike multiplication with regular numbers (scalars), where $a \times b = b \times a$ , matrix multiplication is generally not commutative. This means that, in most cases:

$AB \neq BA$

Sometimes, $BA$ might not even be defined even if $AB$ is. For instance, in our example above, $A$ is $2 \times 3$ and $B$ is $3 \times 2$ . The product $AB$ is defined and results in a $2 \times 2$ matrix.

What about $BA$ ? Here, $B$ is $3 \times 2$ and $A$ is $2 \times 3$ . The inner dimensions match (2 and 2), so $BA$ is defined. The resulting matrix $BA$ will be $3 \times 3$ .

Since $AB$ is $2 \times 2$ and $BA$ is $3 \times 3$ , they clearly cannot be equal. Let's compute $BA$ with NumPy to see:

# Calculate BA (note the order)
C_BA = B @ A

print(f"\nResult BA = B @ A:\n{C_BA}")
# Output:
# Result BA = B @ A:
# [[ 39  54  69]
#  [ 13  23  33]
#  [ 14  19  24]]

print(f"Shape of BA: {C_BA.shape}") # Output: Shape of BA: (3, 3)

As expected, $BA$ is a $3 \times 3$ matrix and is completely different from $AB$ .

Even if $A$ and $B$ are square matrices of the same size, where both $AB$ and $BA$ are defined and have the same dimensions, the results will usually be different. The order in which you multiply matrices matters significantly. This has important implications in areas like computer graphics and machine learning, where sequences of matrix operations represent sequences of transformations or computational steps. Changing the order changes the outcome.

Was this section helpful?

References

Introduction to Linear Algebra, Gilbert Strang, 2016 (Wellesley-Cambridge Press) - A fundamental textbook providing a rigorous and intuitive foundation in linear algebra, including a detailed treatment of matrix multiplication and its properties.
Mathematics for Machine Learning, Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong, 2020 (Cambridge University Press) DOI: 10.1017/9781108679904 - Specifically designed for machine learning practitioners, this book explains linear algebra concepts, such as matrix multiplication, in the context of their applications in ML.
numpy.matmul, NumPy Developers, 2023 - Official documentation for NumPy's matmul function and the @ operator, detailing their behavior for matrix multiplication with examples.
Linear Algebra (18.06SC), Gilbert Strang, 2011 (MIT OpenCourseWare) - A foundational MIT course offering comprehensive video lectures and materials on linear algebra, including in-depth discussions of matrix operations and their interpretations.