Before you can start working with NumPy and Pandas, you need to install them on your computer. These libraries don't come built into Python by default, so you'll need to add them. This process involves using tools called package managers, which help download and install software libraries like the ones we need.
There are two primary ways to set up your environment for data science work in Python: using the Anaconda distribution or using Python's standard package installer, pip
. We recommend Anaconda for beginners as it simplifies the process significantly.
Anaconda is a free and open-source distribution of Python and R specifically designed for scientific computing and data science. It comes bundled with Python, many essential libraries (including NumPy, Pandas, and Jupyter Notebooks), and its own package manager called conda
.
Why Anaconda?
conda
makes it easy to create isolated environments for different projects, preventing conflicts between library versions.Steps:
conda
is available:
conda --version
You should see the conda
version number printed.conda list numpy pandas
If they are listed, you're ready to go! If not, or if you want to ensure you have the latest versions compatible with the distribution, you can install or update them:
# To install if missing:
conda install numpy pandas jupyterlab
# To update existing packages:
conda update numpy pandas jupyterlab
We include jupyterlab
here as it provides the Jupyter Notebook environment we'll use shortly.If you already have Python installed on your system (downloaded from python.org or installed via other means) and prefer not to use Anaconda, you can use pip
, Python's default package installer.
Why pip?
Best Practice: Virtual Environments
When using pip
, it's highly recommended to use virtual environments. A virtual environment is an isolated directory containing a specific Python version and its own set of installed libraries. This prevents conflicts between packages required for different projects. Python has built-in support for this via the venv
module.
Steps:
Ensure Python and pip: First, make sure you have Python 3 installed. Open your terminal or command prompt and type:
python --version
# or maybe python3 --version
pip --version
# or maybe pip3 --version
If these commands work and show versions, you're set. If not, you'll need to install Python first from python.org. pip
is typically included with Python versions 3.4 and later.
Create a Virtual Environment: Navigate to your project directory (or create one) in your terminal and run:
# Replace 'myenv' with your preferred environment name
python -m venv myenv
This creates a directory named myenv
containing the Python installation files.
Activate the Environment: Before installing packages, you need to activate the environment:
source myenv/bin/activate
myenv\Scripts\activate.bat
myenv\Scripts\Activate.ps1
(You might need to adjust execution policy: Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope Process
)Your terminal prompt should change to indicate that the environment (myenv
in this case) is active.
Install Libraries: Now, use pip
to install NumPy, Pandas, and JupyterLab:
pip install numpy pandas jupyterlab
pip
will download and install the libraries into your active virtual environment.
Deactivate (When Done): When you finish working on your project, you can deactivate the environment by simply typing:
deactivate
Regardless of the method you chose, it's good practice to verify that the libraries were installed correctly and can be imported in Python.
python
and press Enter. You should see the Python prompt (>>>
).import numpy as np
import pandas as pd
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")
exit()
and press Enter to leave the Python interpreter.With your environment now set up, you have the foundational tools ready. In the next section, we'll start using these libraries by running some basic code examples within a Jupyter Notebook, the interactive environment favored by many data scientists.
© 2025 ApX Machine Learning