Having explored regression problems where the objective is to predict a continuous numerical value, we now turn our attention to classification. In classification tasks, the goal is fundamentally different: we aim to assign each input data point to one of several predefined, distinct categories or classes.
Think back to regression where the target variable y could take any value within a range, like predicting house prices or temperature. In classification, the target variable y is discrete and belongs to a finite set of labels. For example, an email might be classified as either {Spam
, Not Spam
}, a tumor might be diagnosed as {Benign
, Malignant
}, or a customer transaction might be flagged as {Fraudulent
, Not Fraudulent
}. These are examples of binary classification, where there are only two possible outcomes.
Classification problems can also involve more than two categories. This is known as multiclass classification. Examples include:
0
, 1
, 2
, ..., 9
}).Sports
, Technology
, Politics
, Business
}).Cat
, Dog
, Bird
, Fish
}).Formally, given a dataset of input features X and corresponding class labels y, the objective of a classification algorithm (often called a classifier) is to learn a mapping function f. This function takes the features X of a new, unseen data point as input and predicts its class label y^:
y^=f(X)
Where y^ belongs to the predefined set of possible classes C={c1,c2,...,ck}. The classifier essentially learns a decision boundary in the feature space that separates the different classes.
Consider the difference visually: Regression seeks to fit a line or curve through data points, while classification seeks to find a boundary between groups of data points.
Regression aims to predict a continuous value (blue line fitting blue points), while classification aims to separate data into distinct groups (red dashed line separating green circles from purple squares).
Classification is a cornerstone of supervised machine learning with wide-ranging applications. Building accurate classifiers allows us to automate decision-making processes, identify patterns, and gain insights from labeled data. From filtering unwanted emails to aiding medical professionals and understanding customer behavior, classification models are indispensable tools.
In the following sections of this chapter, we will explore specific algorithms provided by Scikit-learn for tackling classification tasks, including Logistic Regression, K-Nearest Neighbors, and Support Vector Machines. We will also learn how to evaluate the performance of these classifiers using appropriate metrics, as simply measuring "accuracy" is often not sufficient to understand a model's true effectiveness.
© 2025 ApX Machine Learning