Training a model and evaluating it on a single test set gives you a snapshot of its performance, but this snapshot can sometimes be misleading. If your dataset is small, or if the split happens to be particularly lucky (or unlucky), your evaluation might not accurately reflect how the model will perform on new, unseen data. To build more robust models and get a more reliable estimate of their generalization ability, we turn to techniques like cross-validation. Furthermore, most machine learning models have hyperparameters that need to be set before training; finding the optimal combination of these can significantly impact performance. This section looks at how to perform cross-validation and hyperparameter tuning using MLJ.jl.
Cross-validation is a resampling procedure used to evaluate machine learning models on a limited data sample. The core idea is to split your training data into multiple "folds," then train and evaluate your model multiple times, using a different fold as the test set each time and the remaining folds as the training set. This gives you multiple performance estimates, which can then be averaged to provide a more stable and reliable measure of your model's effectiveness.
The most common form is k-fold cross-validation. Here’s how it works:
Diagram illustrating the K-Fold Cross-Validation process. The data is divided into K folds, and the model is trained and evaluated K times.
In MLJ.jl, cross-validation is performed using the evaluate!
function, which takes a model, your features X
, target y
, a resampling
strategy, and one or more performance measures
.
Let's see this in action. We'll use a DecisionTreeClassifier
and evaluate it using 5-fold cross-validation.
using MLJ
using PrettyPrinting # For nicer dictionary printing
# Load a model and data
DecisionTreeClassifier = @load DecisionTreeClassifier pkg=DecisionTree
X, y = @load_iris;
# Instantiate the model
tree_model = DecisionTreeClassifier()
# Define the resampling strategy: 5-fold Cross-Validation
cv_strategy = CV(nfolds=5, shuffle=true, rng=123)
# Perform evaluation
# We'll use common classification metrics
eval_results = evaluate(tree_model, X, y,
resampling=cv_strategy,
measures=[accuracy, multiclass_f1score, multiclass_precision, multiclass_recall],
verbosity=0) # verbosity=0 to suppress per-fold output
# Display the mean performance across folds
pprint(eval_results.measurement)
# Per-fold results are also available in eval_results.per_fold
When you run this, eval_results.measurement
will show a list of metrics, each averaged over the 5 folds. For example, you might see [0.92, 0.918, 0.921, 0.92]
, corresponding to the mean accuracy, F1-score, precision, and recall. Looking at eval_results.per_fold
would give you a list of lists, showing the metric scores for each fold. This gives a better sense of how the model is likely to perform than a single train-test split. MLJ.jl supports various resampling strategies beyond CV
, such as Holdout
(a simple train-test split) and StratifiedCV
(which ensures class proportions are maintained in each fold, useful for imbalanced datasets).
Most machine learning algorithms have hyperparameters: settings that are not learned from the data itself but are chosen by the practitioner before training begins. For example, a decision tree has hyperparameters like max_depth
(the maximum depth of the tree) or min_samples_split
(the minimum number of samples required to split an internal node). The choice of hyperparameters can significantly affect a model's performance. Hyperparameter tuning (or optimization) is the process of finding the combination of hyperparameter values that yields the best performance for your specific problem.
Several strategies exist for hyperparameter tuning. A common and straightforward one is Grid Search. In grid search, you define a "grid" of possible hyperparameter values. The algorithm then evaluates the model for every combination of values in this grid, typically using cross-validation for each combination. The combination that results in the best average cross-validation score is chosen.
Diagram of Grid Search for hyperparameter tuning. Each combination of hyperparameters is evaluated using cross-validation to find the best set.
MLJ.jl provides the TunedModel
wrapper to automate hyperparameter tuning. You specify the base model, the tuning strategy (like Grid
), the ranges of hyperparameters to explore, the resampling strategy for evaluation, and the performance measure to optimize.
Let's tune the max_depth
and min_samples_leaf
hyperparameters of our DecisionTreeClassifier
.
using MLJ
DecisionTreeClassifier = @load DecisionTreeClassifier pkg=DecisionTree
X, y = @load_iris;
# Instantiate the base model
tree_model = DecisionTreeClassifier()
# Define ranges for hyperparameters
# For max_depth, we'll try values 2, 3, 4, 5
r_max_depth = range(tree_model, :max_depth, lower=2, upper=5, scale=:linear)
# For min_samples_leaf, we'll try values 1, 2, 5
r_min_leaf = range(tree_model, :min_samples_leaf, values=[1, 2, 5])
# Create a TunedModel
tuned_tree = TunedModel(model=tree_model,
tuning=Grid(resolution=10), # resolution for Grid, or specify explicit values
resampling=CV(nfolds=3, rng=456), # CV for inner tuning loop
ranges=[r_max_depth, r_min_leaf],
measure=accuracy, # Metric to optimize
acceleration=CPUThreads()) # Use multi-threading if available
# Wrap the TunedModel in a machine and fit it
mach_tuned_tree = machine(tuned_tree, X, y)
fit!(mach_tuned_tree, verbosity=1) # verbosity=1 shows some tuning progress
# Retrieve the best model and its hyperparameters
best_model_params = fitted_params(mach_tuned_tree)
println("\nBest model hyperparameters:")
pprint(best_model_params.best_model)
# You can also inspect the full report of the tuning process
report_tuned = report(mach_tuned_tree)
# report_tuned.best_measurement will give the accuracy of the best model
# report_tuned.plotting contains data useful for visualizing the tuning results
# The best model can be directly used for predictions
# Or extract the actual fitted model:
# fitted_model = fitted_params(mach_tuned_tree).best_fitted_model
In this example:
r_max_depth
to explore integers from 2 to 5 for max_depth
. The range
function in MLJ is quite flexible. For numerically-typed hyperparameters, you can specify lower
and upper
bounds and scale
(e.g., :linear
, :log
). For hyperparameters with discrete, unordered values, you can pass a vector of values
.TunedModel
is configured with Grid
search. The resolution
parameter for Grid
suggests how many points to sample along each numeric range if not explicitly given as values
.CV(nfolds=3)
) is used internally to evaluate each hyperparameter combination.measure
is set to accuracy
, meaning the grid search will try to find hyperparameters that maximize accuracy.fit!(mach_tuned_tree)
performs the tuning.fitted_params(mach_tuned_tree).best_model
gives you an instance of DecisionTreeClassifier
with the optimal hyperparameters found.report(mach_tuned_tree)
provides a detailed report, including the performance for each hyperparameter combination.Other tuning strategies like RandomSearch
are also available in MLJ.jl, which can be more efficient than Grid
search when the hyperparameter space is large.
A common issue is to tune hyperparameters using your entire dataset (or cross-validation on the entire dataset) and then report the performance of this tuned model. This performance estimate can be overly optimistic because the tuning process itself has "seen" all the data through cross-validation.
A more direct approach involves an initial split of your data:
TunedModel
) only on the training set. This will identify the best hyperparameters.MLJ.jl's machine
abstraction naturally supports this. You would fit the TunedModel
machine (step 2) on the training portion (X_train
, y_train
). The fitted_params
from this machine gives you the best_model
(which is a model specification with optimal hyperparameters). You then create a new, standard machine with this best_model
specification and fit it on the full X_train
, y_train
(step 3), before finally predicting and evaluating on X_test
, y_test
(step 4).
By diligently applying cross-validation for evaluation and hyperparameter tuning, you can build models that are not only performant but also whose performance estimates you can trust. These techniques are fundamental to developing effective machine learning solutions in Julia or any other language.
Was this section helpful?
© 2025 ApX Machine Learning