Linear models, such as Linear Regression or Logistic Regression, are powerful and interpretable, but they inherently assume a linear relationship between the features and the target variable. What happens when this relationship isn't a straight line? One way to extend these models to capture non-linear patterns is by creating polynomial features.
Essentially, polynomial features are new features derived by raising existing numerical features to a power (like x2, x3) or by multiplying features together (interaction terms like x1x2). By adding these non-linear terms to our dataset, a linear model can learn curved relationships.
Consider a simple dataset with one feature, x. If the true relationship with the target y is quadratic, say y≈ax2+bx+c, a standard linear model fitting y≈wx+b will perform poorly. However, if we create a new feature x2 and fit a linear model using both x and x2, the model becomes y≈w1x+w2x2+b. This is still a linear model with respect to the coefficients (w1,w2,b), but it can now model a quadratic relationship between the original feature x and the target y.
Scikit-learn provides a convenient transformer, PolynomialFeatures
, within its preprocessing
module to generate these features automatically.
Let's see it in action. Suppose we have a simple dataset with two features, f1
and f2
:
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
# Sample data: 3 samples, 2 features
X = np.array([[2, 3],
[4, 1],
[0, 5]])
# Initialize PolynomialFeatures transformer for degree 2
# include_bias=False removes the constant term (column of 1s)
poly = PolynomialFeatures(degree=2, include_bias=False)
# Fit and transform the data
X_poly = poly.fit_transform(X)
print("Original features:\n", X)
print("\nPolynomial features (degree=2):\n", X_poly)
print("\nFeature names:", poly.get_feature_names_out(['f1', 'f2']))
The output would be:
Original features:
[[2 3]
[4 1]
[0 5]]
Polynomial features (degree=2):
[[ 2. 3. 4. 6. 9.] # f1, f2, f1^2, f1*f2, f2^2
[ 4. 1. 16. 4. 1.]
[ 0. 5. 0. 0. 25.]]
Feature names: ['f1' 'f2' 'f1^2' 'f1 f2' 'f2^2']
As you can see, PolynomialFeatures(degree=2)
generated the original features (f1
, f2
), the squared terms (f1^2
, f2^2
), and the interaction term (f1*f2
).
The key parameters for PolynomialFeatures
are:
degree
: The maximum degree of the polynomial features. A degree of 2 generates terms up to x2, x1x2; degree 3 generates terms up to x3, x12x2, x1x22, etc.interaction_only
: If set to True
, only interaction features (products of distinct features like x1x2) are produced, not higher-order terms of single features (like x12). Defaults to False
.include_bias
: If set to True
(the default), it includes a bias column (a feature containing only 1s). This is often useful for linear models but can sometimes be redundant if the subsequent estimator handles the intercept. We set it to False
in the example for clarity.Let's visualize how adding polynomial features allows a linear model to fit non-linear data. We'll create synthetic data where y is roughly a quadratic function of x, plus some noise.
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
# Assume plotly is imported as px and go for graph_objects
# Generate synthetic non-linear data
np.random.seed(42)
n_samples = 100
X = np.random.rand(n_samples, 1) * 10 - 5 # Feature values between -5 and 5
y = 0.8 * X**2 + 0.5 * X + 2 + np.random.randn(n_samples, 1) * 4 # Quadratic relationship + noise
# 1. Fit standard Linear Regression
linear_reg = LinearRegression()
linear_reg.fit(X, y)
y_pred_linear = linear_reg.predict(X)
# 2. Create polynomial features (degree 2) and fit Linear Regression
poly_features = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly_features.fit_transform(X)
poly_reg = LinearRegression()
poly_reg.fit(X_poly, y)
# Create predictions on a grid for smooth lines
X_grid = np.arange(-5, 5, 0.1).reshape(-1, 1)
X_grid_poly = poly_features.transform(X_grid)
y_pred_poly = poly_reg.predict(X_grid_poly)
y_pred_linear_grid = linear_reg.predict(X_grid) # Linear model prediction on grid
# Create Plotly chart
import plotly.graph_objects as go
fig = go.Figure()
# Add scatter plot for original data
fig.add_trace(go.Scatter(x=X.flatten(), y=y.flatten(), mode='markers', name='Original Data',
marker=dict(color='#228be6', opacity=0.7)))
# Add line for standard linear regression fit
fig.add_trace(go.Scatter(x=X_grid.flatten(), y=y_pred_linear_grid.flatten(), mode='lines', name='Linear Fit',
line=dict(color='#fa5252', width=2)))
# Add line for polynomial regression fit
fig.add_trace(go.Scatter(x=X_grid.flatten(), y=y_pred_poly.flatten(), mode='lines', name='Polynomial Fit (degree=2)',
line=dict(color='#51cf66', width=2)))
fig.update_layout(
title="Linear vs. Polynomial Regression Fit",
xaxis_title="Feature (x)",
yaxis_title="Target (y)",
legend_title="Model",
template="plotly_white",
width=700,
height=400,
margin=dict(l=20, r=20, t=50, b=20) # Reduce margins
)
# fig.show() # In a real environment, this would display the chart
# Chart JSON (single line for embedding)
chart_json = fig.to_json(pretty=False)
print(f"```plotly\n{chart_json}\n```")
The standard linear fit (red line) fails to capture the curve in the data. The polynomial fit (green line), using degree-2 features (x and x2), models the underlying quadratic relationship much better.
While powerful, polynomial features require careful consideration:
StandardScaler
or MinMaxScaler
) before applying PolynomialFeatures
. This is because high-degree polynomial terms can result in very large or very small values, potentially causing numerical instability or making the model sensitive to features with naturally larger ranges. Scaling ensures all features contribute more evenly.Here's how you might integrate scaling, polynomial feature generation, and a regularized linear model using a Scikit-learn Pipeline:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge
# Assuming X and y from the previous example are available
# Create a pipeline
poly_pipeline = Pipeline([
('scaler', StandardScaler()), # Scale features first
('poly', PolynomialFeatures(degree=2, include_bias=False)), # Generate polynomial features
('ridge_reg', Ridge(alpha=1.0)) # Use Ridge regression for regularization
])
# Fit the pipeline
poly_pipeline.fit(X, y)
# Make predictions (pipeline handles scaling and transformation)
# y_pred_pipeline = poly_pipeline.predict(X)
print("Pipeline fitted successfully.")
# print("First 5 predictions:", y_pred_pipeline[:5].flatten())
In summary, polynomial features provide a straightforward way to add non-linearity to models that are fundamentally linear. By generating squared terms, cubic terms, and interactions, you give these models the capacity to learn more complex patterns in the data. However, this power comes with the responsibility of managing the increased feature space and the potential for overfitting, often requiring careful degree selection, feature scaling, and regularization.
© 2025 ApX Machine Learning