Matrix multiplication is one of the most fundamental operations in linear algebra, particularly when working with data transformations and machine learning models. Unlike element-wise multiplication (often called the Hadamard product), matrix multiplication has a specific definition that allows us to combine linear transformations or process data in sophisticated ways.
The Row-by-Column Rule
The core idea behind multiplying two matrices, say A and B, to get a product C=AB, is based on computing dot products between the rows of A and the columns of B.
For this multiplication to be defined, the number of columns in the first matrix (A) must exactly match the number of rows in the second matrix (B). If A is an m×n matrix (meaning m rows and n columns) and B is an n×p matrix (n rows and p columns), their product C=AB will be an m×p matrix. The "inner" dimensions (n) must match, and the "outer" dimensions (m and p) determine the shape of the result.
The element in the i-th row and j-th column of the resulting matrix C, denoted as Cij, is calculated by taking the dot product of the i-th row of A and the j-th column of B.
Mathematically, if A=[aik] and B=[bkj], then the element Cij of the product C=AB is given by:
Cij=∑k=1naikbkj
This means you multiply corresponding elements from the i-th row of A and the j-th column of B and then sum up those products.
A Concrete Example
Let's compute the product of two matrices, A and B:
A=[1324],B=[5768]
Here, A is 2×2 and B is 2×2. The inner dimensions match (both are 2), so the multiplication is defined. The resulting matrix C=AB will be 2×2.
Let's calculate the elements of C=[c11c21c12c22]:
- c11: Dot product of Row 1 of A and Column 1 of B.
c11=(1×5)+(2×7)=5+14=19
- c12: Dot product of Row 1 of A and Column 2 of B.
c12=(1×6)+(2×8)=6+16=22
- c21: Dot product of Row 2 of A and Column 1 of B.
c21=(3×5)+(4×7)=15+28=43
- c22: Dot product of Row 2 of A and Column 2 of B.
c22=(3×6)+(4×8)=18+32=50
So, the resulting matrix is:
C=AB=[19432250]
Visualizing the Process
We can visualize the calculation of a single element, like c11, as combining the first row of A and the first column of B:
Calculation of the top-left element (C₁₁) by taking the dot product of the first row of A (blue) and the first column of B (red).
Important Properties and Considerations
-
Dimension Compatibility: Remember, the product AB is only defined if the number of columns of A equals the number of rows of B. If the dimensions don't align, the multiplication cannot be performed.
-
Non-Commutativity: Unlike multiplication of scalar numbers, matrix multiplication is generally not commutative. That is, AB=BA in most cases. Let's swap the order for our example matrices A and B:
BA=[5768][1324]
- Element (1,1): (5×1)+(6×3)=5+18=23
- Element (1,2): (5×2)+(6×4)=10+24=34
- Element (2,1): (7×1)+(8×3)=7+24=31
- Element (2,2): (7×2)+(8×4)=14+32=46
BA=[23313446]
Clearly, AB=[19432250]=[23313446]=BA. The order of multiplication matters significantly.
-
Associativity: Matrix multiplication is associative: (AB)C=A(BC), provided the dimensions are compatible for all multiplications. This property is useful because it means we can group sequences of matrix multiplications without changing the final result.
-
Distributivity: Matrix multiplication distributes over matrix addition: A(B+C)=AB+AC and (A+B)C=AC+BC, again assuming compatible dimensions.
Significance in Machine Learning
Matrix multiplication isn't just an abstract mathematical rule; it's central to many machine learning operations:
- Applying Linear Transformations: As discussed in the next section, multiplying a matrix A by a vector x (which is just an n×1 matrix) results in a new vector y=Ax. This represents applying the linear transformation defined by A to the vector x. If you have a dataset represented by a matrix X (where each column is a data point), calculating AX applies the transformation A to all data points simultaneously.
- Composition of Transformations: If applying transformation B then transformation A is desired, the combined transformation is represented by the single matrix product AB. This is heavily used in neural networks where data passes through multiple layers, each performing a linear transformation (often followed by a non-linear activation).
- Solving Linear Systems: The equation Ax=b represents a system of linear equations. Matrix multiplication defines how the coefficients in A combine with the variables in x to produce the outcomes in b.
- Calculating Covariance Matrices: In statistics and techniques like PCA, calculating the covariance matrix often involves matrix multiplication (e.g., XTX after centering the data X).
Understanding the mechanics and properties of matrix multiplication is therefore essential for comprehending how many machine learning algorithms process and transform data. In the practical sections that follow, you'll use libraries like NumPy, which implement these operations highly efficiently.