At its core, machine learning is about finding patterns in data. But for a computer to process this data, it must be converted into a structured, numerical format. This is where linear algebra provides the essential framework. It gives us the tools to organize information into clean, rectangular arrays of numbers that algorithms can work with.
Think about a simple dataset used for predicting house prices. For each house, we might record several attributes: its size in square feet, the number of bedrooms, and the number of bathrooms. In machine learning, these attributes are called features. A single house, with all its features, represents one observation or sample.
To a machine learning model, one specific house, say with 1,500 square feet, 3 bedrooms, and 2 bathrooms, is not a description. It's a point in space. We can represent this house as a feature vector, which is simply an ordered list of its numerical features.
house_1=150032The order in the vector is important. We must decide on a consistent sequence, for example, [square footage, bedrooms, bathrooms]. Every single house in our dataset will be converted into a vector following this same structure. This allows us to compare different houses and perform mathematical operations on them consistently.
A dataset rarely contains just one observation. Typically, we have hundreds, thousands, or even millions of them. Stacking all of our individual feature vectors together gives us a data matrix, which is almost always denoted by a capital letter, such as X.
By convention in machine learning, each row in the data matrix corresponds to a single observation (a house), and each column corresponds to a single feature (e.g., square footage).
If we have data for three houses:
We can stack their feature vectors to form the data matrix X:
X=150021001200342232This matrix now contains our entire dataset of features. The dimensions of this matrix tell us about our dataset. If a matrix X has a shape of m×n, it means we have m observations and n features. In our example, X is a 3×3 matrix.
In many machine learning tasks, known as supervised learning, our goal is to predict a specific outcome. For our housing dataset, this would be the price of each house. This outcome is called the label or target.
Just as we did with the features, we collect the labels for each observation into a vector, commonly named y. If the prices for our three houses were 300,000,450,000, and 250,000,ourtargetvectory$ would be:
y=300000450000250000The first entry in y corresponds to the first row in X, the second to the second, and so on. This alignment is critical for the model to learn the relationship between the features in X and the outcomes in y.
A diagram showing how raw data is split and organized into a feature matrix X and a target vector y, which then serve as inputs to a machine learning model.
Structuring data this way might seem like a simple organizational step, but it's the foundation upon which all modern machine learning is built. Here’s why this is so effective:
In NumPy, creating these structures is straightforward. Our house data matrix and target vector would look like this:
import numpy as np
# Data matrix X (3 observations, 3 features)
X = np.array([
[1500, 3, 2],
[2100, 4, 3],
[1200, 2, 2]
])
# Target vector y (3 labels)
y = np.array([300000, 450000, 250000])
print("Data Matrix X:\n", X)
print("\nTarget Vector y:\n", y)
By organizing data into vectors and matrices, we translate a practical problem into a mathematical one. This allows us to use the full power of linear algebra to find patterns, make predictions, and build the intelligent systems that define machine learning.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with