Alright, let's put theory into practice. This hands-on session focuses on solidifying your understanding of Pandas Series and DataFrames. We'll work through creating these fundamental data structures using different methods and then apply various techniques to inspect them, getting a feel for their contents and characteristics. We assume you are working within a Jupyter Notebook environment, as introduced in Chapter 1.
First, ensure you have Pandas imported. It's conventional to import it under the alias pd
. We'll also import NumPy as np
since Pandas often works alongside it.
import pandas as pd
import numpy as np
print("Pandas version:", pd.__version__)
print("NumPy version:", np.__version__)
A Series is like a one-dimensional labeled array. Let's create a few.
1. From a Python List: Pandas automatically creates a default integer index if you don't specify one.
# Simple list of numbers
data_list = [10, 20, 30, 40, 50]
series_from_list = pd.Series(data_list)
print("Series created from a list:")
print(series_from_list)
# Accessing elements (similar to lists/arrays)
print("\nFirst element:", series_from_list[0])
2. From a Python List with a Custom Index: You can provide meaning to the index by assigning labels.
# List of temperatures
temperatures = [22.5, 24.1, 19.8, 23.0]
# Corresponding days
days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday']
series_with_index = pd.Series(temperatures, index=days)
print("\nSeries with a custom index:")
print(series_with_index)
# Accessing using the custom index
print("\nTemperature on Tuesday:", series_with_index['Tuesday'])
3. From a Python Dictionary: Dictionaries naturally map keys (which become the index) to values.
# Population data (in millions)
population_dict = {'California': 39.5, 'Texas': 29.1, 'Florida': 21.5, 'New York': 19.4}
series_from_dict = pd.Series(population_dict)
print("\nSeries created from a dictionary:")
print(series_from_dict)
# Check the data type
print("\nData type of the series:", series_from_dict.dtype) # Floats in this case
4. From a NumPy Array: You can easily convert NumPy arrays into Series.
# A NumPy array
np_array = np.array([100, 200, 300, 400])
series_from_numpy = pd.Series(np_array, index=['a', 'b', 'c', 'd'])
print("\nSeries created from a NumPy array:")
print(series_from_numpy)
# Check its index and values
print("\nIndex:", series_from_numpy.index)
print("Values:", series_from_numpy.values) # Returns a NumPy array
DataFrames are the workhorse of Pandas, representing tabular data.
1. From a Dictionary of Lists: This is a very common way to create DataFrames. Each dictionary key becomes a column name, and the list associated with it becomes the data for that column. All lists must have the same length.
# Data for students
student_data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [20, 21, 19, 22],
'Major': ['CompSci', 'Physics', 'Math', 'CompSci'],
'GPA': [3.8, 3.5, 3.9, 3.2]
}
df_students = pd.DataFrame(student_data)
print("DataFrame created from a dictionary of lists:")
print(df_students)
2. From a List of Dictionaries:
Each dictionary in the list represents a row. Pandas infers the column names from the keys. Missing keys in a dictionary will result in NaN
(Not a Number) values.
# Data where some info might be missing
sensor_readings = [
{'sensor': 'A', 'temp': 25.5, 'humidity': 60},
{'sensor': 'B', 'temp': 26.1}, # Humidity missing
{'sensor': 'A', 'temp': 25.8, 'humidity': 62},
{'sensor': 'C', 'temp': 24.9, 'pressure': 1012} # Different fields
]
df_sensors = pd.DataFrame(sensor_readings)
print("\nDataFrame created from a list of dictionaries:")
print(df_sensors)
Notice the NaN
values where data was missing. Pandas handles this automatically.
3. From a 2D NumPy Array: You can create a DataFrame from a NumPy array, optionally providing column and index names.
# 2D NumPy array
data_np = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Create DataFrame with default index/columns
df_from_np_default = pd.DataFrame(data_np)
print("\nDataFrame from NumPy array (default names):")
print(df_from_np_default)
# Create DataFrame with custom names
df_from_np_custom = pd.DataFrame(data_np,
index=['Row1', 'Row2', 'Row3'],
columns=['ColA', 'ColB', 'ColC'])
print("\nDataFrame from NumPy array (custom names):")
print(df_from_np_custom)
Now let's use the df_students
DataFrame we created earlier to practice inspection techniques.
print("Student DataFrame for inspection:")
print(df_students)
# 1. View Top Rows with head()
print("\nFirst 2 rows (head):")
print(df_students.head(2)) # Default is 5 rows
# 2. View Bottom Rows with tail()
print("\nLast 2 rows (tail):")
print(df_students.tail(2)) # Default is 5 rows
# 3. Get Concise Summary with info()
print("\nDataFrame Info:")
df_students.info()
# This shows column names, non-null counts, and data types (dtypes).
# Notice 'Major' is 'object' (typically strings), Age is int64, GPA is float64.
# 4. Get Statistical Summary with describe()
print("\nStatistical Description:")
print(df_students.describe())
# Provides count, mean, std deviation, min, max, and quartiles for NUMERICAL columns.
# 'Name' and 'Major' (object types) are excluded by default.
# 5. Check Dimensions with shape
print("\nDataFrame Shape (rows, columns):", df_students.shape)
# 6. List Column Names
print("\nColumn Names:", df_students.columns)
# 7. View the Index
print("\nIndex:", df_students.index) # Shows the default RangeIndex here
This practical session covered creating Series and DataFrames from various common data structures like lists, dictionaries, and NumPy arrays. We also practiced using essential methods like head()
, tail()
, info()
, and describe()
, along with attributes like shape
, columns
, and index
, to quickly understand the structure and content of our data. These are fundamental skills you'll use constantly when working with Pandas.
© 2025 ApX Machine Learning