Accessing specific data subsets in Pandas DataFrames is achieved through primary methods such as selecting columns by name, label-based indexing with .loc, positional indexing with .iloc, and filtering data using boolean conditions. Practical exercises will solidify understanding and build confidence in applying these techniques.
First, ensure you have Pandas imported. Typically, we import it under the alias pd.
import pandas as pd
import numpy as np # We might use numpy for creating data later
Next, let's create a sample DataFrame to work with. We'll make a small dataset representing information about different types of fruits.
# Create the data dictionary
data = {
'Fruit': ['Apple', 'Banana', 'Orange', 'Grape', 'Strawberry', 'Blueberry'],
'Color': ['Red', 'Yellow', 'Orange', 'Purple', 'Red', 'Blue'],
'Price': [1.2, 0.5, 0.8, 4.5, 3.0, 5.5],
'Quantity': [100, 150, 80, 200, 120, 90]
}
# Define meaningful index labels
index_labels = ['F01', 'F02', 'F03', 'F04', 'F05', 'F06']
# Create the DataFrame
inventory_df = pd.DataFrame(data, index=index_labels)
# Display the DataFrame
print("Original Fruit Inventory DataFrame:")
print(inventory_df)
This will output:
Original Fruit Inventory DataFrame:
Fruit Color Price Quantity
F01 Apple Red 1.2 100
F02 Banana Yellow 0.5 150
F03 Orange Orange 0.8 80
F04 Grape Purple 4.5 200
F05 Strawberry Red 3.0 120
F06 Blueberry Blue 5.5 90
Now, let's practice selecting data from inventory_df.
Select the Fruit column. Remember, accessing a single column returns a Pandas Series.
# Select the 'Fruit' column
fruit_column = inventory_df['Fruit']
print("\nSelecting the 'Fruit' column (returns a Series):")
print(fruit_column)
Selecting the 'Fruit' column (returns a Series):
F01 Apple
F02 Banana
F03 Orange
F04 Grape
F05 Strawberry
F06 Blueberry
Name: Fruit, dtype: object
Now, select both the Fruit and Price columns. Accessing multiple columns returns a DataFrame. Note the use of double square brackets [[]].
# Select 'Fruit' and 'Price' columns
fruit_price_df = inventory_df[['Fruit', 'Price']]
print("\nSelecting 'Fruit' and 'Price' columns (returns a DataFrame):")
print(fruit_price_df)
Selecting 'Fruit' and 'Price' columns (returns a DataFrame):
Fruit Price
F01 Apple 1.2
F02 Banana 0.5
F03 Orange 0.8
F04 Grape 4.5
F05 Strawberry 3.0
F06 Blueberry 5.5
.loc (Label-Based)Select the row corresponding to the label F03 (Orange).
# Select row with label 'F03'
orange_row = inventory_df.loc['F03']
print("\nSelecting row with label 'F03' using .loc:")
print(orange_row)
Selecting row with label 'F03' using .loc:
Fruit Orange
Color Orange
Price 0.8
Quantity 80
Name: F03, dtype: object
Select the Price for the fruit with label F05 (Strawberry).
# Select the 'Price' for label 'F05'
strawberry_price = inventory_df.loc['F05', 'Price']
print(f"\nPrice of fruit 'F05' using .loc: {strawberry_price}")
Price of fruit 'F05' using .loc: 3.0
Select the rows for labels F02 to F04 (inclusive) and only the Fruit and Quantity columns.
# Select rows 'F02' through 'F04', columns 'Fruit' and 'Quantity'
subset_loc = inventory_df.loc['F02':'F04', ['Fruit', 'Quantity']]
print("\nSelecting rows 'F02' to 'F04' and specific columns using .loc:")
print(subset_loc)
Selecting rows 'F02' to 'F04' and specific columns using .loc:
Fruit Quantity
F02 Banana 150
F03 Orange 80
F04 Grape 200
Notice how .loc includes the end label in the slice (F04 is included).
.iloc (Position-Based)Select the third row (which corresponds to Orange, at index position 2, as indexing starts from 0).
# Select the third row (index position 2)
third_row_iloc = inventory_df.iloc[2]
print("\nSelecting the third row (index 2) using .iloc:")
print(third_row_iloc)
Selecting the third row (index 2) using .iloc:
Fruit Orange
Color Orange
Price 0.8
Quantity 80
Name: F03, dtype: object
Select the value at the intersection of the 4th row (index 3) and the 2nd column (index 1). This should be the color 'Purple'.
# Select value at row index 3, column index 1
value_iloc = inventory_df.iloc[3, 1]
print(f"\nValue at row index 3, column index 1 using .iloc: {value_iloc}")
Value at row index 3, column index 1 using .iloc: Purple
Select the first three rows (indices 0, 1, 2) and the first two columns (indices 0, 1).
# Select the first 3 rows and first 2 columns
subset_iloc = inventory_df.iloc[0:3, 0:2] # Note: 3 and 2 are exclusive here
print("\nSelecting first 3 rows and first 2 columns using .iloc:")
print(subset_iloc)
Selecting first 3 rows and first 2 columns using .iloc:
Fruit Color
F01 Apple Red
F02 Banana Yellow
F03 Orange Orange
Remember that slicing with .iloc excludes the end position, similar to standard Python list slicing.
Select all fruits that are 'Red'.
# Create a boolean condition: Color is 'Red'
red_fruits_condition = inventory_df['Color'] == 'Red'
print("\nBoolean Series for 'Color' == 'Red':")
print(red_fruits_condition)
# Apply the condition to the DataFrame
red_fruits_df = inventory_df[red_fruits_condition]
print("\nSelecting rows where Color is 'Red':")
print(red_fruits_df)
Boolean Series for 'Color' == 'Red':
F01 True
F02 False
F03 False
F04 False
F05 True
F06 False
Name: Color, dtype: bool
Selecting rows where Color is 'Red':
Fruit Color Price Quantity
F01 Apple Red 1.2 100
F05 Strawberry Red 3.0 120
Select all fruits with a Price greater than 1.00 and a Quantity less than 150. Use the & operator for combining conditions.
# Condition 1: Price > 1.0
price_condition = inventory_df['Price'] > 1.0
# Condition 2: Quantity < 150
quantity_condition = inventory_df['Quantity'] < 150
# Combine conditions using '&' (AND)
combined_condition = price_condition & quantity_condition
# Apply the combined condition
filtered_df = inventory_df[combined_condition]
print("\nSelecting rows where Price > 1.0 AND Quantity < 150:")
print(filtered_df)
Selecting rows where Price > 1.0 AND Quantity < 150:
Fruit Color Price Quantity
F01 Apple Red 1.2 100
F05 Strawberry Red 3.0 120
F06 Blueberry Blue 5.5 90
Select the Fruit name and Price for all fruits that are 'Red' OR have a Price less than 0.60. Use the | operator.
# Condition 1: Color is 'Red'
color_condition = inventory_df['Color'] == 'Red'
# Condition 2: Price < 0.6
price_condition_low = inventory_df['Price'] < 0.6
# Combine conditions using '|' (OR)
combined_or_condition = color_condition | price_condition_low
# Apply the condition and select specific columns
filtered_subset = inventory_df.loc[combined_or_condition, ['Fruit', 'Price']]
print("\nSelecting 'Fruit' and 'Price' where Color is 'Red' OR Price < 0.6:")
print(filtered_subset)
Selecting 'Fruit' and 'Price' where Color is 'Red' OR Price < 0.6:
Fruit Price
F01 Apple 1.2
F02 Banana 0.5
F05 Strawberry 3.0
Notice how we combined boolean indexing with .loc to select specific columns for the filtered rows.
Sometimes, it's useful to set one of the columns as the DataFrame index, especially if it contains unique identifiers. Let's set the Fruit column as the index.
# Set 'Fruit' column as the index
# inplace=True modifies the DataFrame directly
# inventory_df.set_index('Fruit', inplace=True)
# Alternatively, create a new DataFrame with the new index
inventory_df_fruit_index = inventory_df.set_index('Fruit')
print("\nDataFrame with 'Fruit' as index:")
print(inventory_df_fruit_index)
DataFrame with 'Fruit' as index:
Color Price Quantity
Fruit
Apple Red 1.2 100
Banana Yellow 0.5 150
Orange Orange 0.8 80
Grape Purple 4.5 200
Strawberry Red 3.0 120
Blueberry Blue 5.5 90
Now, try selecting the 'Orange' row using .loc with the new index.
# Select 'Orange' using the new index
orange_data = inventory_df_fruit_index.loc['Orange']
print("\nSelecting 'Orange' data using the new fruit index:")
print(orange_data)
Selecting 'Orange' data using the new fruit index:
Color Orange
Price 0.8
Quantity 80
Name: Orange, dtype: object
Finally, let's reset the index back to the default integer index, turning the 'Fruit' index back into a regular column.
# Reset the index
inventory_df_reset = inventory_df_fruit_index.reset_index()
print("\nDataFrame after resetting the index:")
print(inventory_df_reset)
DataFrame after resetting the index:
Fruit Color Price Quantity
0 Apple Red 1.2 100
1 Banana Yellow 0.5 150
2 Orange Orange 0.8 80
3 Grape Purple 4.5 200
4 Strawberry Red 3.0 120
5 Blueberry Blue 5.5 90
You can see the Fruit column is back, and a default integer index (0, 1, 2...) has been added.
This hands-on session provided practice in applying the core data selection techniques in Pandas. Being comfortable with [] for column selection, .loc for label-based access, .iloc for position-based access, and boolean indexing for conditional filtering is fundamental for almost any data analysis task. Experiment further with the inventory_df DataFrame or try loading your own datasets (as learned in the previous chapter) and practice selecting different subsets of data.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with