Matrix-Matrix Multiplication

Matrix-matrix multiplication composes two transformations into one. This operation is fundamental in machine learning, especially in neural networks, where passing data through successive layers is essentially a series of matrix multiplications.

The Rule for Multiplication: Matching Inner Dimensions

Before you can multiply two matrices, they must be compatible. This is the most important rule to remember. If you have a matrix $A$ with dimensions $m \times n$ (meaning $m$ rows and $n$ columns) and a matrix $B$ with dimensions $n \times p$ , you can multiply them to get a new matrix $C$ with dimensions $m \times p$ .

The rule is simple: The number of columns in the first matrix must equal the number of rows in the second matrix. We call these the "inner dimensions."

A_{m \times n} \cdot B_{n \times p} = C_{m \times p}

The "outer dimensions," $m$ and $p$ , determine the shape of the final matrix. If the inner dimensions do not match, the multiplication is undefined.

The number of columns in the first matrix must align with the number of rows in the second matrix for the multiplication to be valid. The resulting matrix inherits the rows of the first and the columns of the second.

How to Calculate the Product

The actual calculation is an extension of the matrix-vector multiplication you saw earlier. To get the element in the $i$ -th row and $j$ -th column of the resulting matrix $C$ , you calculate the dot product of the $i$ -th row of matrix $A$ and the $j$ -th column of matrix $B$ .

Let's walk through an example. Suppose we want to compute the product $C = AB$ , where:

A = \begin{pmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{pmatrix} \quad \text{and} \quad B = \begin{pmatrix} 7 & 8 \\ 9 & 1 \\ 2 & 3 \end{pmatrix}

First, check the dimensions. $A$ is a $2 \times 3$ matrix and $B$ is a $3 \times 2$ matrix. The inner dimensions match (3 and 3), so the operation is valid. The resulting matrix $C$ will have dimensions $2 \times 2$ .

C = \begin{pmatrix} C_{11} & C_{12} \\ C_{21} & C_{22} \end{pmatrix}

Now let's compute each element of $C$ :

To find $C_{11}$ (1st row, 1st column): Take the dot product of the 1st row of $A$ and the 1st column of $B$ .
$C_{11} = (1 \cdot 7) + (2 \cdot 9) + (3 \cdot 2) = 7 + 18 + 6 = 31$
To find $C_{12}$ (1st row, 2nd column): Take the dot product of the 1st row of $A$ and the 2nd column of $B$ .
$C_{12} = (1 \cdot 8) + (2 \cdot 1) + (3 \cdot 3) = 8 + 2 + 9 = 19$
To find $C_{21}$ (2nd row, 1st column): Take the dot product of the 2nd row of $A$ and the 1st column of $B$ .
$C_{21} = (4 \cdot 7) + (5 \cdot 9) + (6 \cdot 2) = 28 + 45 + 12 = 85$
To find $C_{22}$ (2nd row, 2nd column): Take the dot product of the 2nd row of $A$ and the 2nd column of $B$ .
$C_{22} = (4 \cdot 8) + (5 \cdot 1) + (6 \cdot 3) = 32 + 5 + 18 = 55$

Putting it all together, our final matrix is:

C = \begin{pmatrix} 31 & 19 \\ 85 & 55 \end{pmatrix}

Important Properties of Matrix Multiplication

Matrix multiplication has some properties that are different from the multiplication of regular numbers (scalars).

Order Matters: It is Not Commutative

For scalars, $3 \times 5$ is the same as $5 \times 3$ . This is not true for matrices. In general, $AB \neq BA$ . This is one of the most significant differences.

Using our previous example, let's try to compute $BA$ :

B_{3 \times 2} \cdot A_{2 \times 3}

The inner dimensions (2 and 2) match, so we can perform this multiplication. The result will be a $3 \times 3$ matrix, which is a different shape than the $2 \times 2$ matrix we got from $AB$ . Since the results have different shapes, they cannot be equal. Even if the shapes were the same, the values would likely be different.

It is Associative

While the order of matrices cannot be swapped, the order of operations does not matter if the matrices themselves stay in the same sequence. This is called the associative property:

(AB)C = A(BC)

This is useful because it means you can group matrix multiplications in any way that is computationally efficient without changing the final result.

Matrix Multiplication in NumPy

NumPy makes matrix multiplication straightforward. The modern and recommended way is to use the @ operator, which was introduced specifically for matrix multiplication.

import numpy as np

# Define our matrices A and B from the example
A = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

B = np.array([
    [7, 8],
    [9, 1],
    [2, 3]
])

# Perform matrix multiplication using the @ operator
C = A @ B

print("Matrix A (2x3):\n", A)
print("\nMatrix B (3x2):\n", B)
print("\nResult of A @ B (2x2):\n", C)

Output:

Matrix A (2x3):
 [[1 2 3]
 [4 5 6]]

Matrix B (3x2):
 [[7 8]
 [9 1]
 [2 3]]

Result of A @ B (2x2):
 [[31 19]
 [85 55]]

The result matches our manual calculation perfectly.

You might also see the np.dot() function used for matrix multiplication. It works for both vector dot products and matrix multiplication, which can sometimes be confusing.

# Using np.dot() also works
C_dot = np.dot(A, B)
print("\nResult using np.dot(A, B):\n", C_dot)

For clarity and readability, it's best to use @ for matrix multiplication and np.dot() when you are explicitly calculating the dot product of two vectors.

Was this section helpful?

References

Introduction to Linear Algebra, Gilbert Strang, 2016 (Wellesley-Cambridge Press) - A canonical textbook for linear algebra, providing a thorough theoretical foundation for matrix operations, including multiplication and its properties.
Mathematics for Machine Learning, Marc Peter Deisenroth, A. Aldo Faisal, Cheng Soon Ong, 2020 (Cambridge University Press) - Covers linear algebra concepts, including matrix multiplication, with a specific focus on their applications and relevance in machine learning algorithms.
numpy.matmul, NumPy Developers, 2023 (NumPy Project) - Official documentation for NumPy's matrix multiplication function, detailing its usage and the @ operator, essential for practical implementation.