While selecting columns by name is straightforward, Pandas provides more powerful mechanisms for selecting rows or combinations of rows and columns. The primary label-based selection method is the .loc
accessor. Think of .loc
as selecting data by its "name" or "label" - whether that's the name you've given to a row (its index label) or the name of a column.
This is different from selecting by numerical position, which we'll cover with .iloc
in the next section. Using labels makes your code more readable and less prone to breaking if the order of your data changes, as long as the labels remain consistent.
The basic syntax involves passing the desired row label(s) and optionally column label(s) inside square brackets []
after .loc
:
# General Syntax
# dataframe.loc[row_label_selector, column_label_selector]
Let's create a sample DataFrame to illustrate how .loc
works. Imagine we have weather data for a few days:
import pandas as pd
import numpy as np
data = {'Temperature': [25, 28, 22, 31, 29],
'Humidity': [60, 55, 70, 50, 65],
'WindSpeed': [10, 12, 8, 15, 11]}
index_labels = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri']
weather_df = pd.DataFrame(data, index=index_labels)
print(weather_df)
Temperature Humidity WindSpeed
Mon 25 60 10
Tue 28 55 12
Wed 22 70 8
Thu 31 50 15
Fri 29 65 11
To select a single row, pass its index label to .loc
. The result will be a Pandas Series where the index is the original DataFrame's column names.
# Select the row with index label 'Wed'
wednesday_data = weather_df.loc['Wed']
print(wednesday_data)
print(type(wednesday_data))
Temperature 22
Humidity 70
WindSpeed 8
Name: Wed, dtype: int64
<class 'pandas.core.series.Series'>
You can select multiple specific rows by providing a list of index labels. The result is a new DataFrame containing only the specified rows.
# Select rows for 'Mon' and 'Fri'
mon_fri_data = weather_df.loc[['Mon', 'Fri']]
print(mon_fri_data)
print(type(mon_fri_data))
Temperature Humidity WindSpeed
Mon 25 60 10
Fri 29 65 11
<class 'pandas.core.frame.DataFrame'>
.loc
also supports slicing using index labels. A significant difference from standard Python slicing or position-based slicing (which we'll see with .iloc
) is that label-based slicing with .loc
includes both the start and the end label.
# Select rows from 'Tue' up to and including 'Thu'
tue_to_thu_data = weather_df.loc['Tue':'Thu']
print(tue_to_thu_data)
Temperature Humidity WindSpeed
Tue 28 55 12
Wed 22 70 8
Thu 31 50 15
Notice how the row labeled 'Thu' is included in the output. This inclusive behavior applies only when using labels for slicing.
The real power of .loc
comes when you select both rows and columns by label. You provide the row selector first, followed by a comma, and then the column selector.
# Select Temperature for Wednesday
temp_wed = weather_df.loc['Wed', 'Temperature']
print(f"Temperature on Wednesday: {temp_wed}\n")
# Select Humidity and WindSpeed for Monday and Tuesday
hum_wind_mon_tue = weather_df.loc[['Mon', 'Tue'], ['Humidity', 'WindSpeed']]
print(hum_wind_mon_tue, "\n")
# Select all rows, but only 'Temperature' and 'Humidity' columns
temp_humidity = weather_df.loc[:, ['Temperature', 'Humidity']]
print(temp_humidity, "\n")
# Select rows 'Wed' through 'Fri' and columns 'Humidity' through 'WindSpeed' (inclusive slicing)
subset_slice = weather_df.loc['Wed':'Fri', 'Humidity':'WindSpeed']
print(subset_slice)
Temperature on Wednesday: 22
Humidity WindSpeed
Mon 60 10
Tue 55 12
Temperature Humidity
Mon 25 60
Tue 28 55
Wed 22 70
Thu 31 50
Fri 29 65
Humidity WindSpeed
Wed 70 8
Thu 50 15
Fri 65 11
In the third example above, the colon :
used in the row position weather_df.loc[:, ['Temperature', 'Humidity']]
signifies "select all rows". Similarly, you could use :
in the column position to select all columns for specific rows.
.loc
While there's a dedicated section on boolean indexing coming up, it's useful to know that .loc
works seamlessly with boolean arrays (Series) for row selection. You can create a condition, which results in a Series of True
/False
values, and pass this Series to .loc
. It will return only the rows where the condition is True
.
# Select days where Temperature was above 25 degrees
hot_days = weather_df.loc[weather_df['Temperature'] > 25]
print(hot_days)
Temperature Humidity WindSpeed
Tue 28 55 12
Thu 31 50 15
Fri 29 65 11
Here, weather_df['Temperature'] > 25
creates a boolean Series:
Mon False
Tue True
Wed False
Thu True
Fri True
Name: Temperature, dtype: bool
Passing this Series to weather_df.loc[...]
effectively selects the rows corresponding to the True
values.
Remember that .loc
always operates on the labels of the index and columns. If your DataFrame has the default integer index (0, 1, 2, ...), then .loc
will use these integers as labels.
df_int_index = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
print(df_int_index)
# Select row with index label 1
print("\nRow with label 1:")
print(df_int_index.loc[1])
A B
0 1 4
1 2 5
2 3 6
Row with label 1:
A 2
B 5
Name: 1, dtype: int64
Even though 1
looks like a position, .loc
treats it as the name or label of that row in this context. This can sometimes be confusing, which is why the purely position-based .iloc
accessor exists, as we will see next.
In summary, .loc
is your tool for selecting data when you know the names (labels) of the rows and columns you want. It supports selecting single items, lists of items, and slices (inclusive) by label, making your selection logic clear and robust against changes in data order.
© 2025 ApX Machine Learning