Alright, let's put the theory of data selection into practice. In the previous sections of this chapter, you learned about the primary ways to access data within Pandas DataFrames: selecting columns by name, using label-based indexing with .loc
, positional indexing with .iloc
, and filtering data using boolean conditions. Now, we'll work through some exercises to solidify your understanding and build confidence in using these techniques.
First, ensure you have Pandas imported. Typically, we import it under the alias pd
.
import pandas as pd
import numpy as np # We might use numpy for creating data later
Next, let's create a sample DataFrame to work with. We'll make a small dataset representing information about different types of fruits.
# Create the data dictionary
data = {
'Fruit': ['Apple', 'Banana', 'Orange', 'Grape', 'Strawberry', 'Blueberry'],
'Color': ['Red', 'Yellow', 'Orange', 'Purple', 'Red', 'Blue'],
'Price': [1.2, 0.5, 0.8, 4.5, 3.0, 5.5],
'Quantity': [100, 150, 80, 200, 120, 90]
}
# Define meaningful index labels
index_labels = ['F01', 'F02', 'F03', 'F04', 'F05', 'F06']
# Create the DataFrame
inventory_df = pd.DataFrame(data, index=index_labels)
# Display the DataFrame
print("Original Fruit Inventory DataFrame:")
print(inventory_df)
This will output:
Original Fruit Inventory DataFrame:
Fruit Color Price Quantity
F01 Apple Red 1.2 100
F02 Banana Yellow 0.5 150
F03 Orange Orange 0.8 80
F04 Grape Purple 4.5 200
F05 Strawberry Red 3.0 120
F06 Blueberry Blue 5.5 90
Now, let's practice selecting data from inventory_df
.
Select the Fruit
column. Remember, accessing a single column returns a Pandas Series.
# Select the 'Fruit' column
fruit_column = inventory_df['Fruit']
print("\nSelecting the 'Fruit' column (returns a Series):")
print(fruit_column)
Selecting the 'Fruit' column (returns a Series):
F01 Apple
F02 Banana
F03 Orange
F04 Grape
F05 Strawberry
F06 Blueberry
Name: Fruit, dtype: object
Now, select both the Fruit
and Price
columns. Accessing multiple columns returns a DataFrame. Note the use of double square brackets [[]]
.
# Select 'Fruit' and 'Price' columns
fruit_price_df = inventory_df[['Fruit', 'Price']]
print("\nSelecting 'Fruit' and 'Price' columns (returns a DataFrame):")
print(fruit_price_df)
Selecting 'Fruit' and 'Price' columns (returns a DataFrame):
Fruit Price
F01 Apple 1.2
F02 Banana 0.5
F03 Orange 0.8
F04 Grape 4.5
F05 Strawberry 3.0
F06 Blueberry 5.5
.loc
(Label-Based)Select the row corresponding to the label F03
(Orange).
# Select row with label 'F03'
orange_row = inventory_df.loc['F03']
print("\nSelecting row with label 'F03' using .loc:")
print(orange_row)
Selecting row with label 'F03' using .loc:
Fruit Orange
Color Orange
Price 0.8
Quantity 80
Name: F03, dtype: object
Select the Price
for the fruit with label F05
(Strawberry).
# Select the 'Price' for label 'F05'
strawberry_price = inventory_df.loc['F05', 'Price']
print(f"\nPrice of fruit 'F05' using .loc: {strawberry_price}")
Price of fruit 'F05' using .loc: 3.0
Select the rows for labels F02
to F04
(inclusive) and only the Fruit
and Quantity
columns.
# Select rows 'F02' through 'F04', columns 'Fruit' and 'Quantity'
subset_loc = inventory_df.loc['F02':'F04', ['Fruit', 'Quantity']]
print("\nSelecting rows 'F02' to 'F04' and specific columns using .loc:")
print(subset_loc)
Selecting rows 'F02' to 'F04' and specific columns using .loc:
Fruit Quantity
F02 Banana 150
F03 Orange 80
F04 Grape 200
Notice how .loc
includes the end label in the slice (F04
is included).
.iloc
(Position-Based)Select the third row (which corresponds to Orange, at index position 2, as indexing starts from 0).
# Select the third row (index position 2)
third_row_iloc = inventory_df.iloc[2]
print("\nSelecting the third row (index 2) using .iloc:")
print(third_row_iloc)
Selecting the third row (index 2) using .iloc:
Fruit Orange
Color Orange
Price 0.8
Quantity 80
Name: F03, dtype: object
Select the value at the intersection of the 4th row (index 3) and the 2nd column (index 1). This should be the color 'Purple'.
# Select value at row index 3, column index 1
value_iloc = inventory_df.iloc[3, 1]
print(f"\nValue at row index 3, column index 1 using .iloc: {value_iloc}")
Value at row index 3, column index 1 using .iloc: Purple
Select the first three rows (indices 0, 1, 2) and the first two columns (indices 0, 1).
# Select the first 3 rows and first 2 columns
subset_iloc = inventory_df.iloc[0:3, 0:2] # Note: 3 and 2 are exclusive here
print("\nSelecting first 3 rows and first 2 columns using .iloc:")
print(subset_iloc)
Selecting first 3 rows and first 2 columns using .iloc:
Fruit Color
F01 Apple Red
F02 Banana Yellow
F03 Orange Orange
Remember that slicing with .iloc
excludes the end position, similar to standard Python list slicing.
Select all fruits that are 'Red'.
# Create a boolean condition: Color is 'Red'
red_fruits_condition = inventory_df['Color'] == 'Red'
print("\nBoolean Series for 'Color' == 'Red':")
print(red_fruits_condition)
# Apply the condition to the DataFrame
red_fruits_df = inventory_df[red_fruits_condition]
print("\nSelecting rows where Color is 'Red':")
print(red_fruits_df)
Boolean Series for 'Color' == 'Red':
F01 True
F02 False
F03 False
F04 False
F05 True
F06 False
Name: Color, dtype: bool
Selecting rows where Color is 'Red':
Fruit Color Price Quantity
F01 Apple Red 1.2 100
F05 Strawberry Red 3.0 120
Select all fruits with a Price
greater than 1.00 and a Quantity
less than 150. Use the &
operator for combining conditions.
# Condition 1: Price > 1.0
price_condition = inventory_df['Price'] > 1.0
# Condition 2: Quantity < 150
quantity_condition = inventory_df['Quantity'] < 150
# Combine conditions using '&' (AND)
combined_condition = price_condition & quantity_condition
# Apply the combined condition
filtered_df = inventory_df[combined_condition]
print("\nSelecting rows where Price > 1.0 AND Quantity < 150:")
print(filtered_df)
Selecting rows where Price > 1.0 AND Quantity < 150:
Fruit Color Price Quantity
F01 Apple Red 1.2 100
F05 Strawberry Red 3.0 120
F06 Blueberry Blue 5.5 90
Select the Fruit
name and Price
for all fruits that are 'Red' OR have a Price
less than 0.60. Use the |
operator.
# Condition 1: Color is 'Red'
color_condition = inventory_df['Color'] == 'Red'
# Condition 2: Price < 0.6
price_condition_low = inventory_df['Price'] < 0.6
# Combine conditions using '|' (OR)
combined_or_condition = color_condition | price_condition_low
# Apply the condition and select specific columns
filtered_subset = inventory_df.loc[combined_or_condition, ['Fruit', 'Price']]
print("\nSelecting 'Fruit' and 'Price' where Color is 'Red' OR Price < 0.6:")
print(filtered_subset)
Selecting 'Fruit' and 'Price' where Color is 'Red' OR Price < 0.6:
Fruit Price
F01 Apple 1.2
F02 Banana 0.5
F05 Strawberry 3.0
Notice how we combined boolean indexing with .loc
to select specific columns for the filtered rows.
Sometimes, it's useful to set one of the columns as the DataFrame index, especially if it contains unique identifiers. Let's set the Fruit
column as the index.
# Set 'Fruit' column as the index
# inplace=True modifies the DataFrame directly
# inventory_df.set_index('Fruit', inplace=True)
# Alternatively, create a new DataFrame with the new index
inventory_df_fruit_index = inventory_df.set_index('Fruit')
print("\nDataFrame with 'Fruit' as index:")
print(inventory_df_fruit_index)
DataFrame with 'Fruit' as index:
Color Price Quantity
Fruit
Apple Red 1.2 100
Banana Yellow 0.5 150
Orange Orange 0.8 80
Grape Purple 4.5 200
Strawberry Red 3.0 120
Blueberry Blue 5.5 90
Now, try selecting the 'Orange' row using .loc
with the new index.
# Select 'Orange' using the new index
orange_data = inventory_df_fruit_index.loc['Orange']
print("\nSelecting 'Orange' data using the new fruit index:")
print(orange_data)
Selecting 'Orange' data using the new fruit index:
Color Orange
Price 0.8
Quantity 80
Name: Orange, dtype: object
Finally, let's reset the index back to the default integer index, turning the 'Fruit' index back into a regular column.
# Reset the index
inventory_df_reset = inventory_df_fruit_index.reset_index()
print("\nDataFrame after resetting the index:")
print(inventory_df_reset)
DataFrame after resetting the index:
Fruit Color Price Quantity
0 Apple Red 1.2 100
1 Banana Yellow 0.5 150
2 Orange Orange 0.8 80
3 Grape Purple 4.5 200
4 Strawberry Red 3.0 120
5 Blueberry Blue 5.5 90
You can see the Fruit
column is back, and a default integer index (0, 1, 2...) has been added.
This hands-on session provided practice in applying the core data selection techniques in Pandas. Being comfortable with []
for column selection, .loc
for label-based access, .iloc
for position-based access, and boolean indexing for conditional filtering is fundamental for almost any data analysis task. Experiment further with the inventory_df
DataFrame or try loading your own datasets (as learned in the previous chapter) and practice selecting different subsets of data.
© 2025 ApX Machine Learning