Classification is a supervised learning task where the goal is to predict discrete categories or labels, rather than continuous numerical values. For example, classification can predict if an email is spam, or assign an image to a specific object category.
Imagine you're not trying to predict a specific number, but rather trying to place something into a group or category. That's the essence of classification. The goal is to learn a mapping from input variables (features) to predefined, discrete categories or classes.
Think about these common scenarios:
spam or not spam?cat, a dog, or a car?disease or no disease?churn (stop using a service) or not churn?In each case, the prediction isn't a number on a sliding scale; it's a distinct label chosen from a finite set of possibilities.
The defining characteristic of a classification problem is that the target variable, the thing we want to predict, is categorical. This means it takes on values that represent distinct groups or classes.
features to make predictions. For spam detection, features might include the frequency of certain words, the sender's address, or whether the email contains attachments. For medical diagnosis, features could be patient age, blood pressure, or results from specific lab tests.class label (or simply class or category). These labels are predefined. Examples include {'spam', 'not spam'}, {'cat', 'dog', 'car'}, {'disease A', 'disease B', 'healthy'}.Classification problems can be broadly categorized into two main types:
"1. Binary Classification: This is the simplest form, where there are only two possible outcome classes. Many questions fall into this category, often framed as yes/no decisions."
* Examples: Spam detection (spam/not spam), medical testing (positive/negative), churn prediction (churn/no churn).
0, 1, 2, ..., 9), object recognition in images (person, car, tree, building), document categorization (sports, politics, technology, business).Let's visualize a simple classification scenario. Imagine we have data points with two features (Feature 1 and Feature 2), and each point belongs to one of two classes (Class A or Class B).
Data points belonging to two different classes plotted based on two features. The goal of a classification algorithm is to learn how to distinguish between these classes based on their features.
A classification algorithm's job is to learn a "rule" or "boundary" (often called a decision boundary, which we'll discuss later) that separates the different classes based on the patterns in the features. When a new, unseen data point arrives, the algorithm uses this learned rule to assign it to the most likely class.
In this chapter, we will look at specific algorithms designed for these tasks, starting with Logistic Regression, which is commonly used for binary classification, and then K-Nearest Neighbors (KNN), a versatile algorithm applicable to both binary and multiclass problems. We'll also cover how to measure how well our classification models are performing.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with