Okay, let's move from predicting numbers (regression) to predicting categories (classification). Imagine you want to determine if an email is spam or not spam, if a customer will click an ad or not, or if a tumor is malignant or benign. These are classification problems because the outcome is a distinct category or class.
You might wonder, "Can't we just use Linear Regression for this?" After all, we could assign numerical labels like 0 for "not spam" and 1 for "spam" and try to fit a line. However, this approach has significant drawbacks. Linear Regression predicts continuous values, which can fall outside the meaningful range of 0 to 1. For instance, it might predict a value of 1.5 or -0.2 for our spam example, which doesn't make sense as a category label or even a probability. Also, the straight line produced by linear regression often doesn't capture the threshold-like nature of classification effectively, especially when dealing with outliers.
Consider this simple example where we want to classify points based on one feature:
A linear regression line (dashed red) attempts to fit binary data (blue dots at 0 and 1). Notice how the line extends beyond the 0-1 range and doesn't provide a clear decision point.
This is where Logistic Regression comes in. Despite its name containing "Regression," it's a fundamental algorithm designed specifically for classification tasks, particularly binary classification (problems with two outcome classes, often labeled 0 and 1).
Instead of outputting a direct prediction value that can range infinitely like Linear Regression, Logistic Regression calculates the probability that a given input belongs to the positive class (usually denoted as class 1). This probability is always constrained to be between 0 and 1, which is exactly what we need.
How does it achieve this? It starts similarly to Linear Regression by calculating a weighted sum of the input features (plus a bias term). However, it doesn't stop there. It passes this result through a special function called the Sigmoid function (also known as the logistic function). This function takes any real number as input and squashes it into an output value between 0 and 1.
So, the output of Logistic Regression isn't the class label itself, but a probability, let's call it p. For example, if we're predicting spam (class 1) vs. not spam (class 0), the model might output p=0.85 for a particular email. This means the model estimates an 85% probability that the email is spam.
To get the final discrete class prediction (0 or 1), we apply a decision threshold. A common threshold is 0.5.
This process provides a much more suitable framework for classification than Linear Regression. It outputs probabilities, which are interpretable, and uses a threshold to make the final categorical decision.
In the next sections, we'll look more closely at the Sigmoid function that makes this possible and the concept of decision boundaries created by the model.
© 2025 ApX Machine Learning