Once you have meticulously constructed and trained your machine learning pipeline, its utility extends far past a single session. Persisting your trained models and pipelines is essential for reusing them without costly retraining, deploying them into applications, sharing them with collaborators, or simply ensuring your work can be reliably reproduced later. In the MLJ.jl ecosystem, this process is straightforward.
MLJ.jl provides the MLJ.save
function to serialize a trained machine to a file. A "machine" in MLJ is an object that binds a model (which could be a single learner or a complex pipeline) to data and stores the learned parameters (the fitresult
) after training. The MLJ.save
function typically uses the JLD2.jl package under the hood, saving the machine as a .jlso
(Julia Serialized Object) file. This format is efficient for storing Julia objects.
Let's say you've trained a machine, mach
, which could encapsulate anything from a simple decision tree to a multi-step preprocessing and modeling pipeline. Saving it is as simple as:
using MLJ
import JLD2 # MLJ.save uses JLD2.jl internally
# Assume 'mach' is a trained machine
# For example:
# ModelType = @load DecisionTreeClassifier pkg=DecisionTree
# model = ModelType()
# X, y = @load_iris
# mach = machine(model, X, y)
# fit!(mach)
# Save the trained machine
MLJ.save("my_trained_model.jlso", mach)
This command saves the entire state of mach
, including the model's hyperparameters and its learned parameters, to the file "my_trained_model.jlso".
To bring your saved machine back into a Julia session, you use the machine
constructor itself, but instead of providing a model and data, you provide the path to the .jlso
file:
# In a new session, or after 'mach' is no longer in memory:
loaded_mach = machine("my_trained_model.jlso")
The loaded_mach
is now an exact replica of the machine you saved. It's ready to make predictions or be inspected without needing to be retrained. You can verify this by using it for prediction:
# Assume X_new is new data compatible with the model
# y_pred = predict(loaded_mach, X_new)
# @info "Predictions with loaded model:", y_pred[1:5]
This seamless saving and loading capability is particularly powerful for pipelines. If mach
was a pipeline machine, MLJ.save
stores the entire pipeline structure along with the fitted state of each component.
When you use MLJ.save(filename, mach)
, you are serializing the MLJ Machine
object. This object bundles:
DecisionTreeClassifier(max_depth=3)
or a @pipeline
object).fit!
operation. This is the "intelligence" of your trained model. For a linear model, this would be coefficients; for a tree, it's the tree structure.The underlying JLD2.jl
serializer is designed to handle a wide variety of Julia types, making it well-suited for the complex objects that can constitute an MLJ machine.
Let's walk through a more complete example involving a pipeline. We'll define a simple pipeline, train it, save it, load it, and then use it for prediction.
using MLJ
import RDatasets: dataset
import JLD2 # For MLJ.save
# Load necessary model types
Standardizer = @load Standardizer pkg=MLJModels
OneHotEncoder = @load OneHotEncoder pkg=MLJModels
DecisionTreeClassifier = @load DecisionTreeClassifier pkg=DecisionTree verbosity=0
# Prepare some data (using a subset for brevity)
data = dataset("datasets", "iris")
X = data[:, [:SepalLength, :SepalWidth, :PetalLength, :PetalWidth]]
y = data[:, :Species]
train_rows, test_rows = partition(eachindex(y), 0.7, shuffle=true, rng=123)
X_train = X[train_rows, :]
y_train = y[train_rows]
X_test = X[test_rows, :]
# Define a pipeline
@pipeline MyPipeline(
std = Standardizer(),
ohe = OneHotEncoder(),
tree = DecisionTreeClassifier(max_depth=3)
) is_probabilistic=true prediction_type=:probabilistic
# Instantiate and train the pipeline machine
pipe_model = MyPipeline()
pipe_mach = machine(pipe_model, X_train, y_train)
fit!(pipe_mach, rows=train_rows) # Use train_rows again, relative to X_train, y_train provided to machine
# Save the trained pipeline machine
MLJ.save("my_pipeline_machine.jlso", pipe_mach)
@info "Pipeline machine saved."
# Simulate loading in a new context
loaded_pipe_mach = machine("my_pipeline_machine.jlso")
@info "Pipeline machine loaded."
# Make predictions with the loaded pipeline
y_pred_loaded = predict(loaded_pipe_mach, X_test)
@info "First 5 predictions from loaded pipeline:" first(y_pred_loaded, 5)
# You can also inspect the fitted parameters or report
# For example, to see the report of the decision tree component:
# report(loaded_pipe_mach).tree
# Or the fitted parameters of the standardizer:
# fitted_params(loaded_pipe_mach).std
This example demonstrates how the entire workflow, from preprocessing (standardization, one-hot encoding) to modeling (decision tree), is encapsulated, trained, saved, and reloaded as a single unit.
The general workflow for saving and loading models and pipelines in MLJ is illustrated below:
This diagram outlines the typical steps: defining and training an MLJ machine, saving it, and then subsequently loading and using the persisted machine for new predictions.
When saving and loading models, keep these points in mind:
.jlso
is the convention for JLD2-serialized files used by MLJ.save
, ensure you use it consistently.Project.toml
and Manifest.toml
files alongside your serialized model. These files lock down the exact versions of all packages used, allowing you to recreate the environment if needed.using MLJDecisionTreeInterface
). MLJ usually handles loading the necessary model code if the package is in your environment..jlso
files. Be mindful of storage, especially if versioning many models.pickle
), only load .jlso
files from trusted sources. A maliciously crafted file could potentially execute arbitrary code upon loading. This is generally less of a concern for models you've trained and saved yourself.By using MLJ.save
and machine
for loading, you can effectively manage the lifecycle of your trained machine learning assets in Julia, making your workflows reproducible and ready for deployment. This practice is fundamental to moving from experimentation to production-ready ML systems.
Was this section helpful?
© 2025 ApX Machine Learning