Having overviewed the purpose of EDA and the standard Python libraries involved, the next step is to prepare your local machine. A consistent and isolated development environment is important for managing project dependencies and ensuring reproducibility. This prevents conflicts between packages required for different projects. We'll outline two common approaches: using Python's built-in venv
module with pip
, or using the Anaconda distribution.
venv
and pip
Python 3 includes the venv
module for creating lightweight virtual environments. This is often preferred if you already have a Python installation you manage yourself.
Create a Virtual Environment: Open your terminal or command prompt, navigate to your desired project directory, and run:
# On macOS/Linux
python3 -m venv eda-env
# On Windows
python -m venv eda-env
This command creates a directory named eda-env
(you can choose any name) containing a copy of the Python interpreter and a place to install libraries.
Activate the Environment: Before installing packages, you need to activate the environment:
# On macOS/Linux
source eda-env/bin/activate
# On Windows (Command Prompt)
eda-env\Scripts\activate.bat
# On Windows (PowerShell)
eda-env\Scripts\Activate.ps1
Your command prompt should now indicate that you are inside the eda-env
environment.
Install Required Libraries:
With the environment active, use pip
, Python's package installer, to install the core libraries needed for this course:
pip install pandas numpy matplotlib seaborn jupyterlab
This command downloads and installs Pandas (for data manipulation), NumPy (for numerical operations), Matplotlib and Seaborn (for visualization), and JupyterLab (a popular interactive development environment for data science). Jupyter Notebook is also a viable alternative if you prefer it (pip install notebook
).
Anaconda is a popular Python distribution specifically designed for data science. It comes pre-packaged with many common libraries and includes its own environment and package manager called conda
. Miniconda is a minimal installer for conda
. If you prefer this ecosystem:
Install Anaconda or Miniconda: Download and install Anaconda or Miniconda from their official websites if you haven't already. Follow the instructions provided for your operating system.
Create a Conda Environment: Open your terminal or Anaconda Prompt and create a new environment:
conda create --name eda-env python=3.9 pandas numpy matplotlib seaborn jupyterlab -y
This command creates an environment named eda-env
with a specific Python version (e.g., 3.9, adjust as needed) and installs the listed packages simultaneously. The -y
flag automatically confirms the installation prompts.
Activate the Environment: Activate the newly created environment:
conda activate eda-env
Your prompt will change to show (eda-env)
.
Regardless of the method chosen, you can verify that the essential libraries are installed correctly. With your virtual environment activated, start a Python interpreter by typing python
in the terminal, or launch JupyterLab by typing jupyter lab
and opening a new notebook.
Inside the Python interpreter or a Jupyter notebook cell, try importing the libraries:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Print library versions (optional, but good practice)
print(f"Pandas version: {pd.__version__}")
print(f"NumPy version: {np.__version__}")
# Note: Matplotlib and Seaborn version printing requires accessing their specific version attributes
import matplotlib
print(f"Matplotlib version: {matplotlib.__version__}")
print(f"Seaborn version: {sns.__version__}")
# Test a simple command
df_test = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
print("\nTest DataFrame created successfully:")
print(df_test.head())
If these commands execute without any ImportError
messages, your environment is correctly set up with the necessary libraries.
You are now equipped with the tools and a dedicated environment to proceed with loading, cleaning, analyzing, and visualizing data in the upcoming chapters.
© 2025 ApX Machine Learning