Regression analysis helps quantify and understand relationships between variables. Simple linear regression (SLR) specifically models how a single dependent or response variable changes in response to another single independent or predictor variable.
Imagine you have data on two variables, say, years of experience () and salary (). You might suspect that as experience increases, salary tends to increase as well. Simple linear regression provides a formal way to model this suspected linear relationship.
"At its core, simple linear regression assumes that the relationship between the independent variable and the dependent variable can be approximated by a straight line. However, data rarely falls perfectly on a line. There's almost always some scatter or variability. To account for this, the theoretical model for simple linear regression is written as:"
Let's break down this equation:
Think of the part as the deterministic, linear component of the relationship, and as the random, unexplained component.
The equation describes the theoretical relationship in the entire population. In practice, we rarely have access to population data. Instead, we work with a sample drawn from the population. Our goal is to use the sample data to estimate the unknown population parameters and .
We denote the estimates calculated from the sample data as (or sometimes ) and (or ). The estimated regression line based on the sample is then:
Here, (read "y-hat") represents the predicted value of for a given value of , based on our sample estimates. The difference between an observed value in our sample and its corresponding predicted value is the sample residual, . These sample residuals are our observable stand-ins for the unobservable theoretical errors .
The challenge, which we'll address in the next section, is how to find the "best" values for and based on our sample data points .
A scatter plot is the ideal way to visualize the data and the potential linear relationship before fitting a model. Simple linear regression essentially tries to find the line that best cuts through the cloud of points on this scatter plot.
A scatter plot showing individual data points (blue dots) and a potential line (red dashed) representing the simple linear regression model . The goal is to find the line that minimizes the overall distance between the line and the points.
Understanding this basic model structure is fundamental. It not only allows us to model simple relationships but also serves as the foundation for more complex regression techniques, including multiple linear regression (using multiple predictors) and polynomial regression (modeling curves), which are frequently used in machine learning.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with