Efficiently accessing and modifying data is crucial in data manipulation. Pandas offers a powerful suite of tools for indexing and selection, allowing you to retrieve and update data with ease. This section will guide you through the basics of indexing and selecting data in Pandas, using simple examples to illustrate these concepts.
In Pandas, both Series and DataFrames have an index, which serves as a unique identifier for each element or row. An index is like a label that helps you pinpoint the exact location of data within a dataset. By default, Pandas assigns an integer index starting from zero, but you can customize this index to better suit your data, such as using a column with unique values like IDs or dates.
Here's how you can create a DataFrame with a custom index:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data, index=['A', 'B', 'C'])
print(df)
This will output:
Name Age City
A Alice 25 New York
B Bob 30 Los Angeles
C Charlie 35 Chicago
loc
and iloc
Pandas provides two main methods for selecting data: loc
and iloc
. These methods allow you to access rows and columns by labels or integer positions, respectively.
loc
The loc
method is label-based, meaning you use the data's labels to select rows and columns. This is particularly useful when you have a custom index. Here's how you can select data using loc
:
# Select a single row by label
row_b = df.loc['B']
print(row_b)
# Select multiple rows and specific columns by label
subset = df.loc[['A', 'C'], ['Name', 'City']]
print(subset)
iloc
In contrast, the iloc
method is integer-location-based, allowing you to select data by the numerical index. This method is useful when you want to access data by its numerical position, regardless of the labels:
# Select a single row by index position
row_0 = df.iloc[0]
print(row_0)
# Select multiple rows and specific columns by index position
subset = df.iloc[[0, 2], [0, 2]]
print(subset)
Boolean indexing is a powerful technique that allows you to filter data based on conditions. You can create a boolean mask by applying a condition to your DataFrame, and then use this mask to select the data that meets the condition:
# Filter rows where Age is greater than 28
age_filter = df['Age'] > 28
filtered_df = df[age_filter]
print(filtered_df)
This will output:
Name Age City
B Bob 30 Los Angeles
C Charlie 35 Chicago
Once you've selected data, you might want to update it. Pandas makes it easy to set values, whether you're updating a single entry or a larger subset of your DataFrame:
# Update a single value
df.loc['A', 'Age'] = 26
# Update multiple values
df.loc[['A', 'B'], 'City'] = 'Unknown'
print(df)
Indexing and selection are fundamental operations in Pandas, enabling you to efficiently access and manipulate your data. As you continue exploring Pandas, these skills will become invaluable, allowing you to work with data more effectively. Remember, practice is key, so experiment with these methods on your datasets to deepen your understanding.
© 2025 ApX Machine Learning