While .loc
is powerful for selecting data based on labels, sometimes you need to access data based purely on its integer position, regardless of the index labels or column names. This is particularly useful when you know the order of rows or columns but not necessarily their labels, or when working with DataFrames that have default integer indices. For this purpose, Pandas provides the .iloc
accessor.
The .iloc
accessor behaves much like standard Python list and NumPy array indexing. It uses 0-based integer positions to select rows and columns. Remember that Python slicing conventions apply: the start bound is inclusive, and the end bound is exclusive.
The general syntax is DataFrame.iloc[row_indexer, column_indexer]
. Both row_indexer
and column_indexer
accept integers, lists of integers, slices with integers, or boolean arrays (though boolean indexing is often clearer using standard []
or .loc
). If you only provide one indexer, it's assumed to be for the rows.
Let's use a sample DataFrame to illustrate:
import pandas as pd
import numpy as np
data = {'col_a': [10, 20, 30, 40, 50],
'col_b': [0.1, 0.2, 0.3, 0.4, 0.5],
'col_c': ['x', 'y', 'z', 'x', 'y']}
# Note the custom string index
df_example = pd.DataFrame(data, index=['row1', 'row2', 'row3', 'row4', 'row5'])
print("Sample DataFrame:")
print(df_example)
Sample DataFrame:
col_a col_b col_c
row1 10 0.1 x
row2 20 0.2 y
row3 30 0.3 z
row4 40 0.4 x
row5 50 0.5 y
To select a single row by its integer position, pass the integer to .iloc
:
# Select the first row (position 0)
first_row = df_example.iloc[0]
print("\nFirst row (position 0):")
print(first_row)
# Select the third row (position 2)
third_row = df_example.iloc[2]
print("\nThird row (position 2):")
print(third_row)
First row (position 0):
col_a 10
col_b 0.1
col_c x
Name: row1, dtype: object
Third row (position 2):
col_a 30
col_b 0.3
col_c z
Name: row3, dtype: object
Notice that even though our index consists of strings ('row1'
, 'row2'
, etc.), .iloc
accesses the rows based on their 0-based integer position. The result is a Pandas Series containing the data from that row, with the original column names as its index.
You can select multiple specific rows by providing a list of integers, or a range of rows using slice notation.
# Select the first and third rows (positions 0 and 2)
rows_0_2 = df_example.iloc[[0, 2]]
print("\nRows at positions 0 and 2:")
print(rows_0_2)
# Select rows from position 1 up to (but not including) position 4
rows_1_to_3 = df_example.iloc[1:4] # Selects rows at index 1, 2, 3
print("\nRows from position 1 up to 4:")
print(rows_1_to_3)
# Select from the beginning up to position 3 (exclusive)
first_three_rows = df_example.iloc[:3]
print("\nFirst three rows:")
print(first_three_rows)
# Select from position 3 to the end
last_rows = df_example.iloc[3:]
print("\nRows from position 3 to end:")
print(last_rows)
Rows at positions 0 and 2:
col_a col_b col_c
row1 10 0.1 x
row3 30 0.3 z
Rows from position 1 up to 4:
col_a col_b col_c
row2 20 0.2 y
row3 30 0.3 z
row4 40 0.4 x
First three rows:
col_a col_b col_c
row1 10 0.1 x
row2 20 0.2 y
row3 30 0.3 z
Rows from position 3 to end:
col_a col_b col_c
row4 40 0.4 x
row5 50 0.5 y
As expected, selecting multiple rows returns a new DataFrame containing the specified rows. The slicing behavior start:end
includes start
but excludes end
, consistent with Python standards.
.iloc
really shines when you need to select specific elements or subsections based on both row and column positions. You provide the row indexer first, followed by the column indexer, separated by a comma.
# Select the element at row position 1, column position 0
element_1_0 = df_example.iloc[1, 0]
print(f"\nElement at row position 1, column position 0: {element_1_0}")
# Select the first row (position 0) and the first two columns (positions 0, 1)
row0_cols01 = df_example.iloc[0, 0:2]
print("\nFirst row, first two columns:")
print(row0_cols01)
# Select the first three rows (positions 0, 1, 2) and the first and third columns (positions 0, 2)
subset = df_example.iloc[0:3, [0, 2]]
print("\nFirst three rows, columns 0 and 2:")
print(subset)
# Select all rows and the last column (position -1 works!)
last_col = df_example.iloc[:, -1]
print("\nAll rows, last column:")
print(last_col)
Element at row position 1, column position 0: 20
First row, first two columns:
col_a 10
col_b 0.1
Name: row1, dtype: object
First three rows, columns 0 and 2:
col_a col_c
row1 10 x
row2 20 y
row3 30 z
All rows, last column:
row1 x
row2 y
row3 z
row4 x
row5 y
Name: col_c, dtype: object
Using :
selects all rows or all columns, similar to NumPy slicing. Negative indices count from the end (e.g., -1
is the last column).
.iloc
vs .loc
It's fundamental to remember the difference:
.loc
: Selects based on labels (index labels, column names). Slicing with labels is inclusive of both start and end labels..iloc
: Selects based on integer position (0-based). Slicing with integers is exclusive of the end position.Trying to use labels with .iloc
or integer positions with .loc
(unless the labels happen to be integers) will result in an error. Understanding this distinction is essential for correctly retrieving the data you need.
Mastering .iloc
provides a precise way to access data based on its position within the DataFrame structure, complementing the label-based selection offered by .loc
.
© 2025 ApX Machine Learning