Sometimes, the relationships between your input features and the target variable aren't straightforwardly linear. Furthermore, the impact of one feature might change depending on the value of another. Linear models, in their basic form, struggle to capture these complexities. Creating interaction terms and polynomial features allows your models, even linear ones, to represent more intricate patterns in the data.
Imagine you have a feature x, and the relationship with your target y looks more like a curve than a straight line. A simple linear model assumes y≈w1x+w0, which won't fit the curve well. Polynomial features address this by adding new features that are powers of the existing ones.
For instance, if you have a single feature x1, generating polynomial features of degree 2 would transform your feature set from [x1] to [1,x1,x12]. The '1' represents the intercept or bias term. If you had two features, x1 and x2, a degree-2 polynomial transformation would produce [1,x1,x2,x12,x1x2,x22]. Notice this includes the original features, their squared terms, and also a term representing their interaction (x1x2).
By adding terms like x12, you allow a linear model to fit a quadratic relationship:
y≈w1x1+w2x12+w0This equation is still linear with respect to the coefficients (w1,w2,w0), which is what matters for linear model algorithms. You've essentially transformed the feature space so that a linear model can find a non-linear decision boundary or regression curve in the original space.
Scikit-learn provides the PolynomialFeatures
transformer for this purpose. Let's see a simple example:
import numpy as np
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures
# Sample data with one feature
X = np.arange(6).reshape(6, 1)
print("Original Features (X):")
print(X)
# Create polynomial features of degree 2
poly = PolynomialFeatures(degree=2, include_bias=False) # exclude bias column
X_poly = poly.fit_transform(X)
print("\nPolynomial Features (degree 2):")
print(X_poly)
print("\nFeature names:")
print(poly.get_feature_names_out(['x1']))
Output:
Original Features (X):
[[0]
[1]
[2]
[3]
[4]
[5]]
Polynomial Features (degree 2):
[[ 0. 0.]
[ 1. 1.]
[ 2. 4.]
[ 3. 9.]
[ 4. 16.]
[ 5. 25.]]
Feature names:
['x1' 'x1^2']
As you can see, the transformer added a new feature representing the square of the original feature x1.
Let's visualize how this helps fit non-linear data. Consider data generated from a quadratic function y=0.5x2−x+2+noise. A standard linear regression will perform poorly. If we first apply a degree-2 polynomial transformation to x, the linear regression can fit the curve much better.
A standard linear model (dashed red line) fails to capture the curve in the data points (blue dots), while a linear model applied to degree-2 polynomial features (solid green line) provides a much better fit.
While powerful, adding polynomial features increases the dimensionality of your dataset significantly. A degree d polynomial for n features can generate on the order of nd new features. This increases computational cost and elevates the risk of overfitting, where the model learns the noise in the training data rather than the underlying pattern. Regularization techniques or subsequent feature selection become even more important.
Sometimes the effect of one feature depends on the level of another. For example, in predicting house prices, the value added by an extra bedroom (num_bedrooms
) might be much higher for houses with a large square footage (sqft
) compared to smaller ones. A simple additive model wouldn't capture this; it would assume the price increase per bedroom is constant regardless of house size.
An interaction term is a feature created by multiplying two or more original features. In the house price example, adding a feature num_bedrooms * sqft
allows the model to learn this combined effect.
The PolynomialFeatures
transformer we saw earlier generates interaction terms by default. When we generated degree-2 features for [x1,x2], we got [1,x1,x2,x12,x1x2,x22]. The x1x2 term is the interaction term.
If you only want interaction terms without the polynomial terms (like x12), you can set interaction_only=True
.
# Sample data with two features
X_two = np.arange(8).reshape(4, 2)
print("Original Features:")
print(pd.DataFrame(X_two, columns=['x1', 'x2']))
# Interaction terms only (equivalent to degree=2, interaction_only=True)
poly_interact = PolynomialFeatures(degree=2, interaction_only=True, include_bias=False)
X_interact = poly_interact.fit_transform(X_two)
print("\nFeatures with Interactions Only:")
print(pd.DataFrame(X_interact, columns=poly_interact.get_feature_names_out(['x1', 'x2'])))
# Polynomial and Interaction terms (degree=2, interaction_only=False)
poly_full = PolynomialFeatures(degree=2, include_bias=False)
X_full_poly = poly_full.fit_transform(X_two)
print("\nFeatures with Polynomial (degree 2) and Interactions:")
print(pd.DataFrame(X_full_poly, columns=poly_full.get_feature_names_out(['x1', 'x2'])))
Output:
Original Features:
x1 x2
0 0 1
1 2 3
2 4 5
3 6 7
Features with Interactions Only:
x1 x2 x1 x2
0 0 1 0.0
1 2 3 6.0
2 4 5 20.0
3 6 7 42.0
Features with Polynomial (degree 2) and Interactions:
x1 x2 x1^2 x1 x2 x2^2
0 0 1 0.0 0.0 1.0
1 2 3 4.0 6.0 9.0
2 4 5 16.0 20.0 25.0
3 6 7 36.0 42.0 49.0
Interaction terms allow models to learn how features modify each other's effects, leading to potentially more accurate predictions when such relationships exist.
StandardScaler
) after generating polynomial and interaction features, especially for models sensitive to feature magnitudes (e.g., regularized linear models, SVMs, neural networks).By thoughtfully creating polynomial and interaction features, you provide your machine learning models with the building blocks needed to understand more complex relationships within your data, often leading to improved predictive performance. However, this power comes with the responsibility of managing the resulting model complexity.
© 2025 ApX Machine Learning