Okay, let's shift gears from classification, where we assign labels like 'cat' or 'dog', to another common type of machine learning problem: regression.
While classification predicts which category something belongs to, regression predicts a continuous numerical value. Think of it as predicting "how much" or "how many" rather than "what kind". The output of a regression model isn't a fixed label, but rather a number that can exist on a scale.
Here are a few common scenarios where regression is used:
Imagine you want to estimate the market value of a house. This is a classic regression problem. You would provide the model with information about the house, known as features. These features might include:
The model learns patterns from historical data containing features of houses and their actual selling prices. Based on these patterns, it predicts a specific price for a new house, like $345,000 or €280,000. This predicted price is a continuous numerical output.
Predicting the temperature for tomorrow is another example. Features could include today's temperature, humidity levels, wind speed, atmospheric pressure, time of year, and historical weather patterns. The regression model processes this information to output a numerical prediction, such as 22.5 degrees Celsius or 72 degrees Fahrenheit.
A company might want to predict how much a specific customer is likely to spend in the next month. Features could involve the customer's past purchase history, browsing behavior on the website, demographics, and engagement with marketing campaigns. The model's output would be a predicted spending amount, like $85.50.
In all these examples, the regression model tries to understand and quantify the relationship between the input features and the continuous output value. It essentially learns a mathematical function that maps the inputs to the output.
A conceptual plot showing predicted values versus actual values for a regression task. Each point represents one prediction. Points lying exactly on the dashed line would indicate perfect predictions. The distance from a point to the line shows the prediction error.
The key difference lies in the nature of the output. Classification outputs discrete labels (e.g., 'spam', 'not spam', 'blue', 'green', 'red'). Regression outputs continuous values (e.g., 10.5, 350000, -5.2). This difference means we need entirely different ways to measure how well our regression models are performing compared to classification models. We can't just count how many predictions were "correct" in the same way. Instead, we need to measure how close our numerical predictions are to the actual values. We will look at specific metrics designed for this purpose in Chapter 3.
Understanding whether your problem requires predicting a category (classification) or a quantity (regression) is a fundamental first step before building and evaluating any machine learning model.
© 2025 ApX Machine Learning