In the previous section, we introduced Logistic Regression as a method for binary classification, aiming to predict whether an input belongs to one of two classes (often labeled 0 or 1). But how does an algorithm that looks somewhat like Linear Regression end up predicting a category? Linear Regression produces outputs that can be any real number, which isn't directly useful for deciding between two discrete classes.
We need a way to transform the output of a linear equation, let's call it z=w⋅x+b (where w represents weights, x the input features, and b the bias), into a value that represents a probability. Probabilities are conveniently bounded between 0 and 1, making them ideal for classification tasks. If the probability is high (close to 1), we can predict class 1; if it's low (close to 0), we predict class 0.
This is where the Sigmoid function, also known as the logistic function, comes into play. It's a mathematical function that takes any real number as input and squashes it into an output between 0 and 1.
The sigmoid function, often denoted by the Greek letter sigma σ, is defined as:
σ(z)=1+e−z1Here, z is the input to the function (which, in Logistic Regression, is the output of the linear part, w⋅x+b), and e is the base of the natural logarithm (approximately 2.718).
Let's examine what this function does:
The function produces a characteristic "S" shape when plotted:
The sigmoid function σ(z)=1+e−z1 smoothly maps any real input z to an output between 0 and 1.
In Logistic Regression, the model calculates the linear combination z=w⋅x+b just like in Linear Regression. However, instead of using z directly as the prediction, it feeds z into the sigmoid function:
hθ(x)=σ(z)=σ(w⋅x+b)=1+e−(w⋅x+b)1The output hθ(x) (where θ represents the model's parameters, w and b) is now interpreted as the estimated probability that the input x belongs to the positive class (class 1).
For example, if for a given input x, the model calculates z=2, then the output probability is σ(2)≈0.88. This means the model estimates an 88% chance that this input belongs to class 1. If another input results in z=−1, the output probability is σ(−1)≈0.27, indicating a 27% chance of belonging to class 1 (or conversely, a 73% chance of belonging to class 0).
This probabilistic output is fundamental to Logistic Regression. Typically, we set a decision threshold (often 0.5) to convert this probability into a definite class prediction. If hθ(x)≥0.5, we predict class 1; otherwise, we predict class 0. The point where z=0 and σ(z)=0.5 often corresponds to the boundary separating the predicted classes, which we will explore next when discussing decision boundaries.
© 2025 ApX Machine Learning