After cleaning and reshaping your data, arranging it in a meaningful order is often a necessary step. Sorting allows you to view data from lowest to highest value, alphabetically, or based on custom criteria, making patterns easier to spot and specific entries simpler to locate. Pandas provides flexible methods to sort data based on either the index labels or the actual values within columns.
Sometimes, you need to arrange your data based on its row or column labels (the index). This is particularly useful if the index represents a time series, an ordered category, or simply needs to be in alphabetical or numerical order. The sort_index()
method handles this.
Let's start with a simple DataFrame:
import pandas as pd
import numpy as np
data = {'col_b': [4, 7, 1, 8, 5],
'col_a': ['apple', 'banana', 'orange', 'apple', 'banana'],
'col_c': [10.0, np.nan, 20.0, 10.0, 15.0]}
df = pd.DataFrame(data, index=['R3', 'R1', 'R5', 'R2', 'R4'])
print("Original DataFrame:")
print(df)
Original DataFrame:
col_b col_a col_c
R3 4 apple 10.0
R1 7 banana NaN
R5 1 orange 20.0
R2 8 apple 10.0
R4 5 banana 15.0
Notice the row index (R3
, R1
, R5
, etc.) is not in alphabetical order. To sort the DataFrame rows by their index labels:
df_sorted_by_index = df.sort_index()
print("\nDataFrame sorted by row index (ascending):")
print(df_sorted_by_index)
DataFrame sorted by row index (ascending):
col_b col_a col_c
R1 7 banana NaN
R2 8 apple 10.0
R3 4 apple 10.0
R4 5 banana 15.0
R5 1 orange 20.0
By default, sort_index()
sorts in ascending order. To sort in descending order, use the ascending=False
argument:
df_sorted_by_index_desc = df.sort_index(ascending=False)
print("\nDataFrame sorted by row index (descending):")
print(df_sorted_by_index_desc)
DataFrame sorted by row index (descending):
col_b col_a col_c
R5 1 orange 20.0
R4 5 banana 15.0
R3 4 apple 10.0
R2 8 apple 10.0
R1 7 banana NaN
You can also sort by the column index (the column names) by specifying axis=1
:
df_sorted_by_columns = df.sort_index(axis=1)
print("\nDataFrame sorted by column index (ascending):")
print(df_sorted_by_columns)
DataFrame sorted by column index (ascending):
col_a col_b col_c
R3 apple 4 10.0
R1 banana 7 NaN
R5 orange 1 20.0
R2 apple 8 10.0
R4 banana 5 15.0
Like many Pandas operations, sort_index()
returns a new sorted DataFrame by default. If you want to modify the original DataFrame directly, use the inplace=True
argument. Be cautious when using inplace=True
, as it overwrites your original data structure.
df_copy = df.copy() # Work on a copy to preserve original df
df_copy.sort_index(inplace=True)
print("\nOriginal DataFrame after inplace sort by index:")
print(df_copy)
Original DataFrame after inplace sort by index:
col_b col_a col_c
R1 7 banana NaN
R2 8 apple 10.0
R3 4 apple 10.0
R4 5 banana 15.0
R5 1 orange 20.0
More frequently, you'll want to sort your DataFrame based on the values in one or more columns. The sort_values()
method is used for this purpose. The most important argument for sort_values()
is by
, which specifies the column name (or list of column names) to sort by.
Let's sort our original DataFrame df
based on the values in col_b
:
df_sorted_by_col_b = df.sort_values(by='col_b')
print("\nDataFrame sorted by 'col_b' (ascending):")
print(df_sorted_by_col_b)
DataFrame sorted by 'col_b' (ascending):
col_b col_a col_c
R5 1 orange 20.0
R3 4 apple 10.0
R4 5 banana 15.0
R1 7 banana NaN
R2 8 apple 10.0
Again, the default sorting order is ascending. Use ascending=False
for descending order:
df_sorted_by_col_b_desc = df.sort_values(by='col_b', ascending=False)
print("\nDataFrame sorted by 'col_b' (descending):")
print(df_sorted_by_col_b_desc)
DataFrame sorted by 'col_b' (descending):
col_b col_a col_c
R2 8 apple 10.0
R1 7 banana NaN
R4 5 banana 15.0
R3 4 apple 10.0
R5 1 orange 20.0
You can sort by multiple columns by passing a list of column names to the by
argument. Pandas will sort by the first column in the list, then use the second column to break ties, and so on.
Let's sort by col_a
(alphabetically) and then by col_b
(numerically) for rows with the same col_a
value:
df_sorted_by_multi = df.sort_values(by=['col_a', 'col_b'])
print("\nDataFrame sorted by 'col_a' then 'col_b' (ascending):")
print(df_sorted_by_multi)
DataFrame sorted by 'col_a' then 'col_b' (ascending):
col_b col_a col_c
R3 4 apple 10.0
R2 8 apple 10.0
R4 5 banana 15.0
R1 7 banana NaN
R5 1 orange 20.0
Notice how rows with 'apple' are together, sorted by col_b
(4 then 8), and rows with 'banana' are together, sorted by col_b
(5 then 7).
You can also specify different sorting orders for each column when sorting by multiple columns. Pass a list of booleans to the ascending
argument, corresponding to the list passed to by
.
Let's sort by col_a
ascending and col_b
descending:
df_sorted_by_multi_mixed = df.sort_values(by=['col_a', 'col_b'], ascending=[True, False])
print("\nDataFrame sorted by 'col_a' (asc) then 'col_b' (desc):")
print(df_sorted_by_multi_mixed)
DataFrame sorted by 'col_a' (asc) then 'col_b' (desc):
col_b col_a col_c
R2 8 apple 10.0
R3 4 apple 10.0
R1 7 banana NaN
R4 5 banana 15.0
R5 1 orange 20.0
Now, for 'apple', the row with col_b
=8 comes before the row with col_b
=4. For 'banana', the row with col_b
=7 comes before col_b
=5.
What happens to missing values (NaN
) when sorting? By default, sort_values()
places NaN
values at the end of the sorted output, regardless of whether the sort is ascending or descending. You can control this behavior using the na_position
argument, which accepts either 'first'
or 'last'
.
Let's sort col_c
(which contains a NaN
) and explicitly put the NaN
first:
df_sorted_nan_first = df.sort_values(by='col_c', na_position='first')
print("\nDataFrame sorted by 'col_c', NaN first:")
print(df_sorted_nan_first)
DataFrame sorted by 'col_c', NaN first:
col_b col_a col_c
R1 7 banana NaN
R3 4 apple 10.0
R2 8 apple 10.0
R4 5 banana 15.0
R5 1 orange 20.0
As with sort_index()
, sort_values()
also accepts the inplace=True
argument to modify the DataFrame directly.
Sorting is a fundamental operation for organizing and understanding your data. Whether arranging rows by index labels or ordering them based on column contents, the sort_index()
and sort_values()
methods provide the necessary tools for bringing structure to your DataFrames.
© 2025 ApX Machine Learning