As your machine learning projects grow, you'll often find yourself working with a variety of Python libraries like NumPy, Pandas, Scikit-learn, Matplotlib, and potentially deep learning frameworks such as TensorFlow or PyTorch. Each of these libraries has its own dependencies and specific version requirements. What happens when Project A needs version 1.20 of NumPy, but Project B requires the newer features found only in version 1.22? Installing libraries directly into your main Python installation can quickly lead to conflicts and make it difficult to ensure your projects run reliably across different setups or over time. This is where virtual environments become indispensable.
A virtual environment is essentially an isolated directory containing a specific Python interpreter and its own set of installed libraries. Think of it as a self-contained workspace for each of your Python projects. When you activate a virtual environment, any packages you install or uninstall are confined to that environment, leaving your global Python installation and other project environments untouched.
This isolation prevents conflicts between the dependencies of different projects. You can have multiple virtual environments on your system, each tailored to the specific needs of a particular project.
Each virtual environment maintains its own independent set of installed packages, preventing version conflicts between projects.
Using virtual environments is a fundamental best practice in Python development, and it's particularly beneficial in machine learning workflows for several reasons:
requirements.txt
or environment.yml
file). Anyone else (or you, on a different machine or later in time) can recreate the exact same environment, significantly reducing the "it works on my machine" problem.venv
and conda
Python offers several ways to manage virtual environments. Two are particularly common:
venv
: This module is included in the Python standard library (since Python 3.3). It's lightweight and generally sufficient for many projects that primarily rely on packages installable via pip
(the Python Package Installer). It creates environments containing a copy or symlink of the Python interpreter and a site-packages
directory for new libraries.conda
: This is a package and environment manager that comes with the Anaconda and Miniconda distributions, which are popular in the data science community. conda
can manage Python packages but also non-Python software dependencies (like C libraries) and the Python interpreter itself. It's particularly useful when projects have complex dependencies not easily managed by pip
alone or when you need to switch between different Python versions easily.For most standard Python ML projects where dependencies are available through pip
, venv
is often the simpler and recommended starting point. If your project involves complex non-Python dependencies or you are already using the Anaconda ecosystem, conda
is a powerful alternative.
venv
Here's a typical workflow using venv
on the command line:
Create the Environment: Navigate to your project directory and run:
# On macOS/Linux
python3 -m venv my_ml_env
# On Windows
python -m venv my_ml_env
This creates a directory named my_ml_env
(you can choose any name) containing the environment files.
Activate the Environment:
# On macOS/Linux (bash/zsh)
source my_ml_env/bin/activate
# On Windows (Command Prompt)
my_ml_env\Scripts\activate.bat
# On Windows (PowerShell)
my_ml_env\Scripts\Activate.ps1
Your command prompt should change to indicate that the environment is active (e.g., (my_ml_env) your_user@machine:...$
).
Install Packages: Now, use pip
to install the libraries needed for your project. These will be installed inside the active environment.
pip install numpy pandas scikit-learn matplotlib seaborn
Freeze Dependencies: To make your environment reproducible, save the list of installed packages and their exact versions into a file, conventionally named requirements.txt
.
pip freeze > requirements.txt
This file can be shared with others or used later to recreate the environment.
Install from Requirements: If you receive a project with a requirements.txt
file, you can create a new virtual environment, activate it, and then install all dependencies with:
pip install -r requirements.txt
Deactivate the Environment: When you're finished working on the project, you can deactivate the environment:
deactivate
This returns you to your system's global Python context.
As discussed in the section on structuring projects, you should typically create a virtual environment within or alongside your main project folder. It's common practice to add the environment directory (my_ml_env/
in the example above) to your project's .gitignore
file to prevent committing the environment itself to version control; only the requirements.txt
file needs to be tracked.
Adopting virtual environments from the start of your machine learning projects is a simple yet effective step towards creating more robust, maintainable, and collaborative codebases. It eliminates a common source of errors and ensures that your carefully crafted data pipelines and models behave consistently across different setups.
© 2025 ApX Machine Learning