Having established the fundamentals of versioning data with DVC and tracking experiments with MLflow in the preceding chapters, we now turn to combining these tools. Effectively integrating data management and experiment logging is key to creating truly reproducible machine learning workflows.
This chapter provides practical guidance on:
dvc run
and dvc repro
.By the end of this chapter, you will understand how to build integrated systems where changes in data, code, parameters, and results are consistently tracked and managed.
4.1 Connecting Data Versions to Experiments
4.2 Structuring Projects for Integration
4.3 Logging DVC Metadata in MLflow
4.4 Creating DVC Pipelines
4.5 Reproducing DVC Pipelines
4.6 Tracking DVC Pipeline Metrics
4.7 Combining DVC Pipelines and MLflow Tracking
4.8 Best Practices for Integrated Workflows
4.9 Hands-on Practical: Building an Integrated Pipeline
© 2025 ApX Machine Learning