In the previous section, we introduced Logistic Regression as a method for binary classification, aiming to predict whether an input belongs to one of two classes (often labeled 0 or 1). But how does an algorithm that looks somewhat like Linear Regression end up predicting a category? Linear Regression produces outputs that can be any real number, which isn't directly useful for deciding between two discrete classes.
We need a way to transform the output of a linear equation, let's call it (where represents weights, the input features, and the bias), into a value that represents a probability. Probabilities are conveniently bounded between 0 and 1, making them ideal for classification tasks. If the probability is high (close to 1), we can predict class 1; if it's low (close to 0), we predict class 0.
This is where the Sigmoid function, also known as the logistic function, comes into play. It's a mathematical function that takes any real number as input and squashes it into an output between 0 and 1.
The sigmoid function, often denoted by the Greek letter sigma , is defined as:
Here, is the input to the function (which, in Logistic Regression, is the output of the linear part, ), and is the base of the natural logarithm (approximately 2.718).
Let's examine what this function does:
The function produces a characteristic "S" shape when plotted:
The sigmoid function smoothly maps any real input to an output between 0 and 1.
In Logistic Regression, the model calculates the linear combination just like in Linear Regression. However, instead of using directly as the prediction, it feeds into the sigmoid function:
The output (where represents the model's parameters, and ) is now interpreted as the estimated probability that the input belongs to the positive class (class 1).
For example, if for a given input , the model calculates , then the output probability is . This means the model estimates an 88% chance that this input belongs to class 1. If another input results in , the output probability is , indicating a 27% chance of belonging to class 1 (or conversely, a 73% chance of belonging to class 0).
This probabilistic output is fundamental to Logistic Regression. Typically, we set a decision threshold (often 0.5) to convert this probability into a definite class prediction. If , we predict class 1; otherwise, we predict class 0. The point where and often corresponds to the boundary separating the predicted classes, which we will explore next when discussing decision boundaries.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with