Let's solidify the concepts of linear regression with a straightforward example. Remember, our goal in regression is to predict a continuous numerical value. In linear regression, we aim to find the best straight line that describes the relationship between an input variable (feature) and an output variable (target).
Imagine we want to understand the relationship between the number of hours a student studies and their score on a final exam. This is a classic regression problem: given the hours studied (input), we want to predict the exam score (output).
Suppose we have collected data from five students:
Hours Studied (x) | Exam Score (y) |
---|---|
1 | 40 |
2 | 55 |
3 | 65 |
4 | 70 |
5 | 80 |
Our input feature x is 'Hours Studied', and our output target y is 'Exam Score'. We want to find a linear relationship between them.
Before building a model, it's always a good idea to visualize the data. A scatter plot is perfect for this, showing each student as a point based on their hours studied and exam score.
Scatter plot showing the relationship between hours studied and exam score for five students.
Looking at the plot, the points seem to roughly follow an upward trend: more hours studied generally correspond to higher scores. A straight line seems like a reasonable way to approximate this relationship.
Simple linear regression models this relationship using the equation of a straight line:
y=mx+b
Or, using notation common in machine learning:
y^=θ1x+θ0
Here:
How do we find the specific values for θ0 and θ1 that give us the "best" line? As we discussed earlier, "best" means minimizing the difference between the predicted scores (y^) from our line and the actual scores (y) in our dataset. This difference is measured by a cost function (like Mean Squared Error).
The gradient descent algorithm is typically used to iteratively adjust θ0 and θ1, gradually reducing the cost function until we find the values that minimize the error and give us the line that fits the data most closely.
Let's assume that after applying this process, we find the best-fit line to be approximately:
y^=10x+30
So, our model parameters are θ1=10 and θ0=30.
Let's visualize this line on our scatter plot:
Scatter plot with the calculated linear regression line showing the trend in the data.
The line doesn't pass perfectly through every point (that's rare with real data), but it captures the general trend.
Now that we have our model (y^=10x+30), we can use it to predict the exam score for a student based on the hours they studied.
For example, what score would we predict for a student who studied for 2.5 hours? We plug x=2.5 into our equation:
y^=10×(2.5)+30 y^=25+30 y^=55
Our model predicts an exam score of 55 for a student who studied 2.5 hours.
The parameters of our model also give us insights:
In this example, we:
This simple example illustrates the core idea behind linear regression: modeling linear relationships in data to make predictions. While real-world problems often involve more features and complexities, the fundamental principles remain the same.
© 2025 ApX Machine Learning