Before we start building models, let's get your environment set up correctly. Scikit-learn is designed to work smoothly within the existing Python scientific computing stack. Having a proper setup ensures that you can follow along with the examples and apply these techniques to your own projects without compatibility issues.
As mentioned in the course prerequisites, you should already have Python installed, along with a basic familiarity with NumPy and Pandas. Scikit-learn builds directly on these libraries, particularly NumPy for its array structures and mathematical functions, and integrates well with Pandas DataFrames.
It's highly recommended to use a virtual environment for your Python projects, including this course. Virtual environments isolate project dependencies, preventing conflicts between different projects that might require different versions of the same library.
Using venv
(Python's built-in tool):
Create an environment: Open your terminal or command prompt, navigate to your project directory, and run:
python -m venv sklearn-env
(Replace sklearn-env
with your preferred environment name).
Activate the environment:
source sklearn-env/bin/activate
sklearn-env\Scripts\activate
Your terminal prompt should now indicate that the virtual environment is active.
Using conda
(if you use Anaconda or Miniconda):
Create an environment:
conda create --name sklearn-env python=3.9
(Replace sklearn-env
with your chosen name and specify a Python version if desired). Conda will ask you to confirm the packages to be installed.
Activate the environment:
conda activate sklearn-env
Working within an active virtual environment ensures that packages you install are contained and won't interfere with your global Python installation or other projects.
With your virtual environment activated, you can install Scikit-learn. The library itself depends on NumPy and SciPy. Installing Scikit-learn using standard package managers like pip
or conda
typically handles these dependencies automatically.
Using pip
:
This is the standard Python package installer. It's usually the simplest method if you're not using the Anaconda distribution.
pip install scikit-learn
This command will download and install the latest stable version of Scikit-learn, along with required versions of NumPy and SciPy if they are not already present in your environment.
Using conda
:
If you are using Anaconda or Miniconda, conda
is the preferred package manager.
conda install scikit-learn
Conda manages packages and environments, including potentially complex binary dependencies, which can sometimes be beneficial, especially on Windows.
Scikit-learn's main dependencies are:
<https://numpy.org/>
). Scikit-learn's primary data structures are often NumPy arrays.<https://scipy.org/>
).These libraries are foundational to the scientific Python ecosystem and are almost always installed alongside Scikit-learn.
While not strictly required for basic Scikit-learn usage, you'll often work with other libraries in a typical machine learning workflow:
<https://pandas.pydata.org/>
). Scikit-learn integrates very well with Pandas DataFrames.<https://matplotlib.org/>
). Useful for exploring data and evaluating models.<https://seaborn.pydata.org/>
).If these are not already installed in your environment, you can add them using pip
or conda
:
Using pip
:
pip install pandas matplotlib seaborn jupyterlab
(We've included jupyterlab
here as it provides a convenient interactive environment for data science work, which you might find useful).
Using conda
:
conda install pandas matplotlib seaborn jupyterlab
To confirm that Scikit-learn is installed correctly, you can start a Python interpreter or a Jupyter Notebook session within your activated virtual environment and run the following commands:
import sklearn
import numpy
import scipy
import pandas # Optional, if installed
import matplotlib # Optional, if installed
# Print the scikit-learn version
print(f"Scikit-learn version: {sklearn.__version__}")
print(f"NumPy version: {numpy.__version__}")
print(f"SciPy version: {scipy.__version__}")
# print(f"Pandas version: {pandas.__version__}") # Uncomment if installed
# print(f"Matplotlib version: {matplotlib.__version__}") # Uncomment if installed
If these commands execute without raising an ImportError
and print the library versions, your environment is set up correctly, and you're ready to proceed with exploring Scikit-learn's features. We will perform a more formal verification in the hands-on practical at the end of this chapter.
© 2025 ApX Machine Learning