While a vector is perfect for representing a single data point with multiple features, machine learning models are rarely trained on just one example. We need a way to organize an entire collection of data points, and for that, we turn to matrices.
A matrix is a rectangular grid of numbers arranged in rows and columns. You can think of it as a generalization of a vector. If a vector is a single list of numbers, a matrix is a collection of lists stacked on top of each other. This structure is fundamental for representing datasets in machine learning.
Let’s say we are building a model to predict house prices. We collect data for several houses. Each house is a single sample, and for each sample, we record a few features like its size, the number of bedrooms, and its age. We can represent each house with a feature vector:
[1500, 3, 10] (1500 sq ft, 3 bedrooms, 10 years old)[2100, 4, 5] (2100 sq ft, 4 bedrooms, 5 years old)[1200, 2, 20] (1200 sq ft, 2 bedrooms, 20 years old)To organize this entire dataset, we can stack these row vectors together to form a matrix.
A dataset organized into a matrix. Each row corresponds to a single data sample (a house), and each column corresponds to a specific feature.
In this matrix:
The dimensions or shape of a matrix are given by its number of rows and columns. We describe it as an "m×n" matrix, where m is the number of rows and n is the number of columns. The housing data matrix above is a 3×3 matrix.
In linear algebra, we typically denote matrices with uppercase letters, such as A. To refer to a specific element within the matrix, we use subscripts. The element in the i-th row and j-th column of matrix A is denoted as Aij or aij. Remember the convention: row first, column second.
For our housing data matrix A:
A=15002100120034210520This row-per-sample and column-per-feature arrangement is the standard convention in machine learning. This data matrix is often called the feature matrix and is commonly represented by the variable X.
The values we want to predict, like the price of each house, are typically stored in a separate column vector. This is often called the target vector and is represented by the variable y.
For our example, the feature matrix X would be the matrix we've already seen, and the target vector y might contain the corresponding prices:
X=15002100120034210520,y=300000450000220000This structured representation of features (X) and targets (y) is the input for nearly all supervised machine learning algorithms. By organizing our data into matrices and vectors, we can use the powerful operations of linear algebra to analyze relationships and build predictive models. Next, we'll get our environment ready to create and manipulate these objects using Python.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with