Simple linear regression helps to understand the relationship between a single predictor variable and a response. However, many phenomena are influenced by multiple factors simultaneously. Predicting house prices, for example, likely involves considering not just square footage but also the number of bedrooms, location, age of the house, and so on. Multiple linear regression extends the principles of linear regression to handle situations with multiple predictor variables.
The core idea is to expand the simple linear regression equation to include terms for each predictor variable. If we have predictor variables, , the multiple linear regression model is defined as:
Let's break down the components:
This model represents a linear relationship between the predictors and the response. While simple linear regression describes a line, multiple linear regression with two predictors describes a plane, and with more than two predictors, it describes a hyperplane in higher-dimensional space.
A multiple linear regression model relates multiple predictor variables () to a single response variable () through estimated coefficients () and an intercept ().
The interpretation of coefficients in multiple regression requires careful consideration. Each coefficient, , represents the expected change in the response variable for a one-unit increase in the corresponding predictor variable , assuming all other predictor variables in the model are held constant.
This "holding other variables constant" aspect is important. The value of in a multiple regression model might be different from the coefficient for in a simple linear regression . This is because the multiple regression coefficient accounts for the influence of on , isolating the unique contribution of .
Just like in simple linear regression, the coefficients () are typically estimated using the method of least squares. The goal remains the same: find the coefficients that minimize the sum of the squared differences between the observed values () and the values predicted by the model (). While the mathematical calculations involve matrix algebra (especially when ), the basis is identical. Fortunately, libraries like Scikit-learn and Statsmodels handle these computations for us.
Model evaluation also uses familiar metrics:
However, has a limitation in the multiple regression context: it always increases or stays the same when you add more predictors to the model, even if those predictors are irrelevant. This can be misleading.
To address this, we often use Adjusted R-squared. This metric modifies by penalizing the inclusion of extra predictors that do not significantly improve the model's fit. Adjusted provides a more honest assessment when comparing models with different numbers of predictors. It increases only if the added variable improves the model more than would be expected by chance.
The assumptions underlying simple linear regression generally extend to multiple linear regression:
In addition, multiple regression introduces a new potential issue:
Building effective multiple regression models often involves selecting the most relevant predictors (feature selection), checking for interactions between predictors (where the effect of one predictor depends on the level of another), and validating the model assumptions.
This overview sets the stage for applying these concepts. In practice, you'll use software tools to fit these models, interpret their output, and diagnose potential problems, allowing you to build more comprehensive predictive models from your data.
Was this section helpful?
© 2026 ApX Machine LearningAI Ethics & Transparency•