While manually crafting interaction terms, polynomial features, or extracting components from dates offers significant improvements, this process often relies heavily on domain knowledge, intuition, and time-consuming experimentation. As the dimensionality of your dataset increases, the sheer number of potential feature combinations can become overwhelming, making exhaustive manual exploration infeasible. This naturally leads to the question: can we automate parts of this creative process?
Automated feature creation refers to algorithms and frameworks designed to systematically generate a large number of candidate features from existing ones. Instead of you deciding to compute fa×fb or log(fc), these tools apply a wide range of mathematical operations and transformations automatically.
The core idea is to define a set of basic building blocks or operations (e.g., addition, subtraction, multiplication, division, absolute value, logarithm, trigonometric functions) and apply them to the initial features. These newly generated features can then be used as inputs for subsequent operations, potentially creating highly complex features through multiple steps.
For example, given initial features f1 and f2, an automated system might generate features such as:
A well-known approach in this area is Deep Feature Synthesis (DFS), notably implemented in the Python library Featuretools
. DFS excels particularly when dealing with relational data spread across multiple tables. It automatically traverses the relationships between these tables and applies predefined functions (called "primitives") to aggregate and transform data, creating meaningful features that might span across the dataset structure. Imagine automatically generating features like 'the average purchase amount for the month before a customer's last session'.
Other techniques, such as Genetic Programming, use evolutionary algorithms to "evolve" mathematical expressions that represent potentially useful features, optimizing these expressions based on how much they improve a model's predictive performance.
While powerful, automated feature creation comes with its own set of considerations:
MAX(transactions.amount) / SQRT(ABS(customer.age - MEAN(transactions.time_since_last)))
, can be much harder to interpret and explain compared to features derived from specific domain insights.Automated feature creation is generally not a replacement for thoughtful, domain-driven feature engineering. Instead, it serves as a complementary approach. It can rapidly explore a vast space of possibilities, potentially discovering complex interactions or transformations that might not be immediately obvious. The resulting candidate features still need careful evaluation, selection, and validation to ensure they provide genuine predictive value and lead to robust, generalizable models. This section provides a brief glimpse into this area; dedicated libraries and techniques often warrant deeper investigation for effective use in practice.
© 2025 ApX Machine Learning