While linear models provide a strong foundation for many supervised learning tasks, decision trees offer a different, often more intuitive, approach to making predictions. They function by creating a tree-like model of decisions based on feature values, leading to a prediction at the leaf nodes. These models are popular dueto their interpretability and ability to capture non-linear relationships.
A decision tree partitions the feature space into a set of rectangles, and then fits a simple model (like a constant) in each one. To predict a new data point, you start at the root of the tree and follow the branches based on the feature values of the point until you reach a leaf node, which contains the predicted outcome. For classification, this is typically the majority class in that region; for regression, it's often the mean of the target values.
One of the main advantages of decision trees is their transparency. The rules learned by the tree can be easily visualized and understood. However, single decision trees can be prone to overfitting, especially if they are deep. This means they might learn the training data too well, including its noise, and perform poorly on unseen data.
Here's a simplified diagram of a decision tree structure:
A simple decision tree illustrating how features are used to make sequential decisions leading to a class prediction.
Julia, through the MLJ.jl framework, provides access to decision tree algorithms, primarily from the DecisionTree.jl
package. Let's see how to implement a decision tree classifier.
First, you'll need to load the model type from the appropriate package. MLJ uses the @load
macro for this, which also helps ensure the package containing the model is available in your environment.
using MLJ
import RDatasets # For example data
# Load a decision tree classifier model
DecisionTreeClassifier = @load DecisionTreeClassifier pkg=DecisionTree
# Prepare some data (example using iris dataset)
iris = RDatasets.dataset("datasets", "iris")
X = select(iris, Not(:Species)) # Features
y = iris.Species # Target
# Initialize the model
tree_model = DecisionTreeClassifier()
Once the model is initialized, you can train it using the fit!
method on a machine, which binds the model to data:
# Create a machine (model + data)
mach_tree = machine(tree_model, X, y)
# Fit the machine
fit!(mach_tree)
After fitting, you can make predictions on new data (or the training data itself):
predictions = predict(mach_tree, X)
# For classifiers, predict_mode gives the most probable class
predicted_classes = predict_mode(mach_tree, X)
Common hyperparameters for decision trees include:
max_depth
: The maximum depth of the tree. Limiting depth can help prevent overfitting.min_samples_split
: The minimum number of samples required to split an internal node.min_samples_leaf
: The minimum number of samples required to be at a leaf node.pruning_purity_threshold
(or post_prune
and merge_purity_threshold
in DecisionTree.jl
): Parameters controlling post-pruning to simplify the tree and improve generalization. For instance, DecisionTreeClassifier(post_prune=true, merge_purity_threshold=0.1)
would enable pruning.Adjusting these hyperparameters is typically done through hyperparameter tuning, which you'll learn more about in the "Cross-Validation and Hyperparameter Tuning in MLJ.jl" section.
While individual decision trees are interpretable, their tendency to overfit can be a significant drawback. Ensemble methods address this by combining the predictions of multiple decision trees (or other types of models) to produce a more accurate overall prediction. Two prominent ensemble techniques based on decision trees are Random Forests and Gradient Boosting Machines.
The core idea is that many diverse, individually weak learners can combine to form a strong learner. This approach often leads to substantial improvements in predictive performance and better generalization to unseen data compared to a single, complex model.
Random Forests construct a multitude of decision trees at training time. For classification, the output is the class selected by most trees; for regression, it's the average of the individual tree predictions.
Two main sources of randomness contribute to the diversity of trees in a Random Forest:
These mechanisms help to reduce the variance of the model without substantially increasing the bias.
In MLJ.jl, you can use RandomForestClassifier
(or RandomForestRegressor
) from DecisionTree.jl
.
# Load a Random Forest classifier model
RandomForestClassifier = @load RandomForestClassifier pkg=DecisionTree
# Initialize the model
# n_trees: number of trees in the forest
# mtry_ratio: fraction of features to consider at each split (mtry / n_features)
# Alternatively, mtry: number of features to consider. If mtry < 0, then mtry_ratio is used.
# A common starting point for mtry is sqrt(number of features) for classification.
# sampling_fraction: fraction of samples to train each tree on
rf_model = RandomForestClassifier(
n_trees=100,
sampling_fraction=0.7,
mtry_ratio=0.5
)
# Create and fit the machine
mach_rf = machine(rf_model, X, y)
fit!(mach_rf)
# Make predictions
rf_predictions = predict_mode(mach_rf, X)
Random Forests are generally less prone to overfitting than individual decision trees and often require less hyperparameter tuning to achieve good performance.
Gradient Boosting Machines (GBMs) also build an ensemble of decision trees, but they do so sequentially. Each new tree attempts to correct the errors made by the previously trained trees. The "gradient" in Gradient Boosting refers to the use of gradient descent to minimize a loss function by iteratively adding trees that predict the residuals (for regression) or pseudo-residuals (for classification) of the current ensemble.
GBMs are powerful and frequently provide state-of-the-art results on tabular data. Popular implementations include XGBoost, LightGBM, and CatBoost. Julia has its own high-performance gradient boosting library, EvoTrees.jl
, which integrates well with MLJ.jl. Alternatively, wrappers for XGBoost are also available.
Let's look at an example using EvoTreesClassifier
from EvoTrees.jl
.
# Load an EvoTrees classifier model
EvoTreesClassifier = @load EvoTreesClassifier pkg=EvoTrees
# Initialize the model
# nrounds: number of boosting rounds (trees)
# eta: learning rate (shrinks the contribution of each tree)
# max_depth: maximum depth of individual trees
evo_model = EvoTreesClassifier(
nrounds=100,
eta=0.1,
max_depth=6
)
# Create and fit the machine
mach_evo = machine(evo_model, X, y)
fit!(mach_evo)
# Make predictions
evo_predictions = predict_mode(mach_evo, X)
Important hyperparameters for GBMs often include:
nrounds
(or n_estimators
): The number of trees to build.learning_rate
(or eta
): Controls the contribution of each tree. A smaller learning rate usually requires more trees but can lead to better generalization.max_depth
: The maximum depth of each individual tree, controlling its complexity.subsample
: The fraction of training data used to grow each tree (stochastic gradient boosting).colsample_bytree
: The fraction of features considered for each tree.Gradient Boosting models can be very effective but may require more careful tuning of hyperparameters compared to Random Forests to prevent overfitting, especially if the number of trees is large or the learning rate is high.
Both Random Forests and Gradient Boosting Machines, like single decision trees, fit into the MLJ.jl workflow for model evaluation, cross-validation, and hyperparameter tuning, which are covered in subsequent sections. By understanding their mechanics and how to implement them in Julia, you can expand your toolkit for tackling diverse supervised learning problems.
Was this section helpful?
© 2025 ApX Machine Learning