Imagine your dataset has hundreds of features. For example, a real estate dataset might include square footage, number of rooms, age of the building, distance to the nearest school, local crime rate, and dozens of other variables. While more data can be useful, having too many features, or dimensions, can make it difficult for machine learning models to learn effectively. It increases computation time and can lead to a problem known as the "curse of dimensionality," where the model learns from noise instead of the underlying signal in the data.
This is where dimensionality reduction comes in. The goal is to reduce the number of features while preserving as much of the important information in the dataset as possible. One of the most common and effective techniques for this is Principal Component Analysis (PCA).
At its core, PCA is a technique that transforms your data into a new set of features, called principal components. These new components are ordered by how much of the original data's variance they capture. The first principal component (PC1) is engineered to capture the largest possible variance. The second principal component (PC2) captures the next largest variance, with the condition that it must be orthogonal (perpendicular) to the first. This continues for all components.
This process gives you a ranked list of components. To reduce dimensionality, you simply keep the first few components that capture the majority of the information and discard the rest.
This is where the eigenvectors and eigenvalues we learned about in Chapter 5 become incredibly useful. PCA works by analyzing the relationships between the features in your dataset. This relationship is captured in a matrix called the covariance matrix.
The eigenvectors of this covariance matrix point in the directions of the highest variance in your data. In fact, these eigenvectors are the principal components. The eigenvector with the largest corresponding eigenvalue is the first principal component, as it points in the direction of the greatest "spread" in the data. The eigenvector with the second-largest eigenvalue is the second principal component, and so on.
The eigenvalues themselves tell you the amount of variance captured by each principal component. A large eigenvalue means its corresponding eigenvector (and principal component) is very significant.
Let's look at a simple 2D dataset. Imagine plotting two features against each other, and the data points form an elongated cloud.
The principal components (red and orange arrows) identify the axes of greatest variance in the data. PC1 captures the most spread, while PC2 captures the next most.
In this plot, you can see that the data varies most along the direction of the red arrow (PC1). There is much less variation along the orange arrow (PC2). If we wanted to reduce this dataset from two dimensions to one, we could project all the data points onto the line defined by PC1. We would lose the information related to PC2, but since PC1 captures most of the variance, we would retain the most important structure of our data.
Here is a high-level summary of how PCA is performed:
By applying PCA, we use the foundations of linear algebra, specifically eigenvalues and eigenvectors, to simplify complex datasets. This makes them easier to visualize, faster to process, and can often lead to better performance in machine learning models by filtering out noise.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with