All Courses

Polynomial Features

Linear models, such as Linear Regression or Logistic Regression, are powerful and interpretable, but they inherently assume a linear relationship between the features and the target variable. What happens when this relationship isn't a straight line? One way to extend these models to capture non-linear patterns is by creating polynomial features.

Essentially, polynomial features are new features derived by raising existing numerical features to a power (like $x^2$ , $x^3$ ) or by multiplying features together (interaction terms like $x_1 x_2$ ). By adding these non-linear terms to our dataset, a linear model can learn curved relationships.

Consider a simple dataset with one feature, $x$ . If the true relationship with the target $y$ is quadratic, say $y \approx ax^2 + bx + c$ , a standard linear model fitting $y \approx wx + b$ will perform poorly. However, if we create a new feature $x^2$ and fit a linear model using both $x$ and $x^2$ , the model becomes $y \approx w_1 x + w_2 x^2 + b$ . This is still a linear model with respect to the coefficients ( $w_1, w_2, b$ ), but it can now model a quadratic relationship between the original feature $x$ and the target $y$ .

Generating Polynomial Features with Scikit-learn

Scikit-learn provides a convenient transformer, PolynomialFeatures, within its preprocessing module to generate these features automatically.

Let's see it in action. Suppose we have a simple dataset with two features, f1 and f2:

import numpy as np
from sklearn.preprocessing import PolynomialFeatures

# Sample data: 3 samples, 2 features
X = np.array([[2, 3],
              [4, 1],
              [0, 5]])

# Initialize PolynomialFeatures transformer for degree 2
# include_bias=False removes the constant term (column of 1s)
poly = PolynomialFeatures(degree=2, include_bias=False)

# Fit and transform the data
X_poly = poly.fit_transform(X)

print("Original features:\n", X)
print("\nPolynomial features (degree=2):\n", X_poly)
print("\nFeature names:", poly.get_feature_names_out(['f1', 'f2']))

The output would be:

Original features:
 [[2 3]
  [4 1]
  [0 5]]

Polynomial features (degree=2):
 [[ 2.  3.  4.  6.  9.]  # f1, f2, f1^2, f1*f2, f2^2
  [ 4.  1. 16.  4.  1.]
  [ 0.  5.  0.  0. 25.]]

Feature names: ['f1' 'f2' 'f1^2' 'f1 f2' 'f2^2']

As you can see, PolynomialFeatures(degree=2) generated the original features (f1, f2), the squared terms (f1^2, f2^2), and the interaction term (f1*f2).

The main parameters for PolynomialFeatures are:

degree: The maximum degree of the polynomial features. A degree of 2 generates terms up to $x^2$ , $x_1 x_2$ ; degree 3 generates terms up to $x^3$ , $x_1^2 x_2$ , $x_1 x_2^2$ , etc.
interaction_only: If set to True, only interaction features (products of distinct features like $x_1 x_2$ ) are produced, not higher-order terms of single features (like $x_1^2$ ). Defaults to False.
include_bias: If set to True (the default), it includes a bias column (a feature containing only 1s). This is often useful for linear models but can sometimes be redundant if the subsequent estimator handles the intercept. We set it to False in the example for clarity.

Visualizing the Impact

Let's visualize how adding polynomial features allows a linear model to fit non-linear data. We'll create synthetic data where $y$ is roughly a quadratic function of $x$ , plus some noise.

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
# Assume plotly is imported as px and go for graph_objects

# Generate synthetic non-linear data
np.random.seed(42)
n_samples = 100
X = np.random.rand(n_samples, 1) * 10 - 5 # Feature values between -5 and 5
y = 0.8 * X**2 + 0.5 * X + 2 + np.random.randn(n_samples, 1) * 4 # Quadratic relationship + noise

# 1. Fit standard Linear Regression
linear_reg = LinearRegression()
linear_reg.fit(X, y)
y_pred_linear = linear_reg.predict(X)

# 2. Create polynomial features (degree 2) and fit Linear Regression
poly_features = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly_features.fit_transform(X)

poly_reg = LinearRegression()
poly_reg.fit(X_poly, y)

# Create predictions on a grid for smooth lines
X_grid = np.arange(-5, 5, 0.1).reshape(-1, 1)
X_grid_poly = poly_features.transform(X_grid)
y_pred_poly = poly_reg.predict(X_grid_poly)
y_pred_linear_grid = linear_reg.predict(X_grid) # Linear model prediction on grid

# Create Plotly chart
import plotly.graph_objects as go

fig = go.Figure()

# Add scatter plot for original data
fig.add_trace(go.Scatter(x=X.flatten(), y=y.flatten(), mode='markers', name='Original Data',
                         marker=dict(color='#228be6', opacity=0.7)))

# Add line for standard linear regression fit
fig.add_trace(go.Scatter(x=X_grid.flatten(), y=y_pred_linear_grid.flatten(), mode='lines', name='Linear Fit',
                         line=dict(color='#fa5252', width=2)))

# Add line for polynomial regression fit
fig.add_trace(go.Scatter(x=X_grid.flatten(), y=y_pred_poly.flatten(), mode='lines', name='Polynomial Fit (degree=2)',
                         line=dict(color='#51cf66', width=2)))

fig.update_layout(
    title="Linear vs. Polynomial Regression Fit",
    xaxis_title="Feature (x)",
    yaxis_title="Target (y)",
    legend_title="Model",
    template="plotly_white",
    width=700,
    height=400,
    margin=dict(l=20, r=20, t=50, b=20) # Reduce margins
)

# fig.show() # In a real environment, this would display the chart

# Chart JSON (single line for embedding)
chart_json = fig.to_json(pretty=False)
print(f"```plotly\n{chart_json}\n```")

The standard linear fit (red line) fails to capture the curve in the data. The polynomial fit (green line), using degree-2 features ( $x$ and $x^2$ ), models the underlying quadratic relationship much better.

Considerations and Best Practices

While powerful, polynomial features require careful consideration:

Choosing the Degree: The degree of the polynomial is a hyperparameter. A low degree might not be flexible enough to capture the underlying pattern (underfitting), while a very high degree can lead to an excessively complex model that fits the noise in the training data too closely (overfitting). The optimal degree is often found using cross-validation.
Dimensionality Explosion: The number of features generated grows rapidly with the degree and the number of original features. For $n$ original features and degree $d$ , the number of resulting features (including bias) is given by the binomial coefficient $\binom{n+d}{d} = \frac{(n+d)!}{d!n!}$ . This can become computationally expensive and increase the risk of overfitting (the "curse of dimensionality").
Feature Scaling: It's often important to scale your features (e.g., using StandardScaler or MinMaxScaler) before applying PolynomialFeatures. This is because high-degree polynomial terms can result in very large or very small values, potentially causing numerical instability or making the model sensitive to features with naturally larger ranges. Scaling ensures all features contribute more evenly.
Regularization: When using polynomial features with linear models, it's almost always recommended to use regularization (like Ridge, Lasso, or ElasticNet). Regularization helps to constrain the model coefficients, preventing them from becoming excessively large and thus reducing overfitting, especially when many polynomial features are present. Lasso (L1 regularization) can even perform implicit feature selection by shrinking some coefficients exactly to zero.

Here's how you might integrate scaling, polynomial feature generation, and a regularized linear model using a Scikit-learn Pipeline:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge
# Assuming X and y from the previous example are available

# Create a pipeline
poly_pipeline = Pipeline([
    ('scaler', StandardScaler()), # Scale features first
    ('poly', PolynomialFeatures(degree=2, include_bias=False)), # Generate polynomial features
    ('ridge_reg', Ridge(alpha=1.0)) # Use Ridge regression for regularization
])

# Fit the pipeline
poly_pipeline.fit(X, y)

# Make predictions (pipeline handles scaling and transformation)
# y_pred_pipeline = poly_pipeline.predict(X)
print("Pipeline fitted successfully.")
# print("First 5 predictions:", y_pred_pipeline[:5].flatten())

In summary, polynomial features provide a straightforward way to add non-linearity to models that are fundamentally linear. By generating squared terms, cubic terms, and interactions, you give these models the capacity to learn more complex patterns in the data. However, this power comes with the responsibility of managing the increased feature space and the potential for overfitting, often requiring careful degree selection, feature scaling, and regularization.

Was this section helpful?