Matrices: Organizing Data in Grids

While a vector is perfect for representing a single data point with multiple features, machine learning models are rarely trained on just one example. We need a way to organize an entire collection of data points, and for that, we turn to matrices.

A matrix is a rectangular grid of numbers arranged in rows and columns. You can think of it as a generalization of a vector. If a vector is a single list of numbers, a matrix is a collection of lists stacked on top of each other. This structure is fundamental for representing datasets in machine learning.

The Anatomy of a Matrix

Let’s say we are building a model to predict house prices. We collect data for several houses. Each house is a single sample, and for each sample, we record a few features like its size, the number of bedrooms, and its age. We can represent each house with a feature vector:

House 1: [1500, 3, 10] (1500 sq ft, 3 bedrooms, 10 years old)
House 2: [2100, 4, 5] (2100 sq ft, 4 bedrooms, 5 years old)
House 3: [1200, 2, 20] (1200 sq ft, 2 bedrooms, 20 years old)

To organize this entire dataset, we can stack these row vectors together to form a matrix.

A dataset organized into a matrix. Each row corresponds to a single data sample (a house), and each column corresponds to a specific feature.

In this matrix:

Rows run horizontally and represent individual data samples or observations. Our matrix has 3 rows.
Columns run vertically and represent the different features of our data. Our matrix has 3 columns.

The dimensions or shape of a matrix are given by its number of rows and columns. We describe it as an " $m \times n$ " matrix, where $m$ is the number of rows and $n$ is the number of columns. The housing data matrix above is a $3 \times 3$ matrix.

Mathematical Notation for Matrices

In linear algebra, we typically denote matrices with uppercase letters, such as $A$ . To refer to a specific element within the matrix, we use subscripts. The element in the $i$ -th row and $j$ -th column of matrix $A$ is denoted as $A_{ij}$ or $a_{ij}$ . Remember the convention: row first, column second.

For our housing data matrix $A$ :

A = \begin{bmatrix} 1500 & 3 & 10 \\ 2100 & 4 & 5 \\ 1200 & 2 & 20 \end{bmatrix}

The element $A_{1,1}$ is $1500$ (first row, first column).
The element $A_{2,3}$ is $5$ (second row, third column).
The element $A_{3,1}$ is $1200$ (third row, first column).

The Standard Convention in Machine Learning

This row-per-sample and column-per-feature arrangement is the standard convention in machine learning. This data matrix is often called the feature matrix and is commonly represented by the variable $X$ .

The values we want to predict, like the price of each house, are typically stored in a separate column vector. This is often called the target vector and is represented by the variable $y$ .

For our example, the feature matrix $X$ would be the matrix we've already seen, and the target vector $y$ might contain the corresponding prices:

X = \begin{bmatrix} 1500 & 3 & 10 \\ 2100 & 4 & 5 \\ 1200 & 2 & 20 \end{bmatrix}, \quad y = \begin{bmatrix} 300000 \\ 450000 \\ 220000 \end{bmatrix}

This structured representation of features ( $X$ ) and targets ( $y$ ) is the input for nearly all supervised machine learning algorithms. By organizing our data into matrices and vectors, we can use the powerful operations of linear algebra to analyze relationships and build predictive models. Next, we'll get our environment ready to create and manipulate these objects using Python.

Was this section helpful?

References

Introduction to Linear Algebra, Gilbert Strang, 2016 (Wellesley-Cambridge Press) - A widely used textbook covering the definition, properties, and operations of matrices and vectors, fundamental for linear algebra comprehension.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Chapter 2 provides an excellent introduction to linear algebra for machine learning, explaining how matrices are used to represent data and perform fundamental operations.
Mathematics for Machine Learning, Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong, 2020 (Cambridge University Press) - A comprehensive resource that covers the mathematical foundations of machine learning, including a dedicated section on matrices and their role in data representation.