Support Vector Machines (SVMs) offer another powerful and versatile approach to classification problems. While algorithms like Logistic Regression find a decision boundary, SVMs take a distinct approach: they aim to find the best possible separating line or hyperplane between classes. The "best" hyperplane is defined as the one that maximizes the distance, known as the margin, to the nearest data points of any class.
Imagine you have data points belonging to two different classes in a 2D plane. You want to draw a straight line that separates these two classes. It's likely that you could draw many possible lines. Which one is optimal?
SVM answers this by looking for the line that is as far as possible from the closest points in both classes. This distance from the separating line (hyperplane) to the nearest point is called the margin. The hyperplane that creates the largest margin is considered the optimal one. This is often called the Maximal Margin Hyperplane.
Why maximize the margin? Intuitively, a larger margin suggests a more confident separation between the classes. A decision boundary with a large margin is likely less sensitive to small perturbations in the data and may generalize better to new, unseen examples.
In general, for an n-dimensional feature space, the separator is an (n−1)-dimensional hyperplane.
The data points that lie closest to the maximal margin hyperplane (exactly on the edge of the margin) are called support vectors. These are the critical data points in the dataset because they "support" or define the hyperplane. If you were to move any of the support vectors, the position of the optimal hyperplane would likely change. Conversely, moving data points that are not support vectors (and are further away from the margin) will not affect the hyperplane, as long as they don't cross the margin boundary. This property makes SVMs memory-efficient, especially in high-dimensional spaces, because the model definition only depends on these few support vectors.
Let's visualize this concept in 2D:
This plot shows two classes of data points. The solid line represents the maximal margin hyperplane found by SVM. The dashed lines indicate the margin boundaries, and the points lying on these dashed lines are the support vectors (highlighted with open circles).
The concept described above works beautifully when the data is linearly separable, meaning you can draw a single straight line (or hyperplane) to separate the classes perfectly. But what if the data isn't linearly separable?
Consider data arranged in concentric circles. No single straight line can separate the inner circle from the outer ring. SVMs handle this using a clever technique often called the kernel trick.
The core idea is to transform the original features into a higher-dimensional space where the data might become linearly separable. Imagine projecting the 2D concentric circle data into 3D in such a way that the inner circle points are at a different "height" than the outer ring points. In this new 3D space, you could potentially separate the classes with a simple plane (a hyperplane in 3D).
Kernels are functions that compute the dot products between data points in this potentially very high-dimensional feature space, without actually having to compute the coordinates of the points in that space. This makes the process computationally efficient. Common kernels include:
By using kernels, SVMs can learn complex, non-linear decision boundaries.
Support Vector Machines are particularly effective in high-dimensional spaces (where the number of features is large, even larger than the number of samples) and are memory efficient due to their reliance on support vectors. The next section details how to implement SVM classifiers using Scikit-learn's SVC
(Support Vector Classifier) estimator, allowing you to apply these concepts in practice.
© 2025 ApX Machine Learning