While Recursive Feature Elimination (RFE) works by starting with all features and pruning the least important ones, Sequential Feature Selection (SFS) offers alternative greedy search strategies. Instead of relying solely on model coefficients or feature importances like RFE, SFS directly evaluates model performance (using a chosen scoring metric) on different subsets of features. It iteratively builds (forward selection) or shrinks (backward selection) the feature set.
SFS methods belong to the wrapper category because they wrap the feature selection process around a specific machine learning model, using its performance as the objective function to guide the search. This makes them computationally more intensive than filter methods but potentially more attuned to the chosen model's needs.
Sequential Forward Selection (SFS) starts with an empty set of features. In each iteration, it evaluates adding each feature not currently in the selected set. The feature whose addition results in the highest performance improvement (according to the chosen scoring metric, evaluated typically via cross-validation) is added to the set. This process continues until a predefined number of features is selected, or until adding any remaining feature does not yield a significant performance improvement.
Algorithm Steps (Forward Selection):
Imagine building a toolkit. You start with nothing and add one tool at a time, always picking the tool that helps you perform the task best with the tools you already have.
Sequential Backward Selection (SBS), sometimes called Sequential Backward Elimination, operates in the opposite direction. It starts with the full set of available features. In each iteration, it evaluates removing each feature currently in the set. The feature whose removal causes the smallest decrease (or largest increase) in model performance is removed. This continues until the desired number of features is reached.
Algorithm Steps (Backward Selection):
This is like starting with a cluttered toolbox and removing one tool at a time, discarding the one you miss the least, until you have a streamlined, effective set.
Scikit-learn provides the SequentialFeatureSelector
class for performing both forward and backward selection.
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# Generate synthetic classification data
X, y = make_classification(n_samples=200, n_features=15, n_informative=5,
n_redundant=5, n_repeated=0, n_classes=2,
n_clusters_per_class=2, random_state=42)
X = pd.DataFrame(X, columns=[f'feature_{i+1}' for i in range(15)])
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Create a pipeline with scaling and logistic regression
# SFS works better with scaled data for many estimators
pipe = Pipeline([
('scaler', StandardScaler()),
('model', LogisticRegression(solver='liblinear', random_state=42))
])
# --- Forward Selection ---
print("Performing Forward Selection...")
sfs_forward = SequentialFeatureSelector(
estimator=pipe.named_steps['model'], # Use the model part of the pipeline
n_features_to_select=5, # Target number of features
direction='forward', # Specify forward selection
scoring='accuracy', # Performance metric
cv=5, # Cross-validation folds
n_jobs=-1 # Use all available CPU cores
)
# Note: SFS should ideally be fit on scaled data.
# We scale data first, then fit SFS using the scaled data and the base model.
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
sfs_forward.fit(X_train_scaled, y_train)
# Get selected feature indices and names
selected_features_mask_fwd = sfs_forward.get_support()
selected_feature_names_fwd = X.columns[selected_features_mask_fwd]
print(f"Selected features (Forward): {selected_feature_names_fwd.tolist()}")
print(f"Number of features selected: {sfs_forward.n_features_to_select_}")
# --- Backward Selection ---
print("\nPerforming Backward Selection...")
sfs_backward = SequentialFeatureSelector(
estimator=pipe.named_steps['model'],
n_features_to_select=5, # Target number of features
direction='backward', # Specify backward selection
scoring='accuracy',
cv=5,
n_jobs=-1
)
sfs_backward.fit(X_train_scaled, y_train)
# Get selected feature indices and names
selected_features_mask_bwd = sfs_backward.get_support()
selected_feature_names_bwd = X.columns[selected_features_mask_bwd]
print(f"Selected features (Backward): {selected_feature_names_bwd.tolist()}")
print(f"Number of features selected: {sfs_backward.n_features_to_select_}")
# You can transform the data to keep only selected features
# X_train_scaled_sfs_fwd = sfs_forward.transform(X_train_scaled)
# X_test_scaled_sfs_fwd = sfs_forward.transform(X_test_scaled)
Key parameters for SequentialFeatureSelector
:
estimator
: The machine learning model used to evaluate feature subsets.n_features_to_select
: The target number of features. Can be an integer, 'auto' (uses tol
parameter), or a float between 0 and 1 (representing a fraction of features). Using 'auto' is often computationally expensive.direction
: 'forward'
or 'backward'
.scoring
: The metric used to evaluate performance (e.g., 'accuracy', 'roc_auc', 'r2', 'neg_mean_squared_error'). Must be a valid Scikit-learn scoring string or a callable scorer.cv
: Number of cross-validation folds or a CV splitter strategy. Essential for robust performance evaluation.n_jobs
: Number of CPU cores to use for parallel execution during cross-validation. -1
uses all available cores.Note that forward and backward selection do not necessarily yield the same set of features, as they explore the feature space differently.
Advantages:
Disadvantages:
SFS is a valuable wrapper method when:
Like other wrapper methods, SFS requires careful consideration of the trade-off between the potential for improved model performance and the computational resources required. It's often a good idea to compare its results with those from filter and embedded methods.
© 2025 ApX Machine Learning