As you clean and prepare your data, you'll often find that the original column names are not ideal. They might be unclear, too long, contain spaces or special characters that are awkward to work with in code, or simply not follow a consistent naming convention. Renaming columns is a frequent and important step in making your DataFrame easier to understand and use.
Pandas provides the flexible .rename()
method specifically for this purpose. It allows you to change column names (and index labels) without altering the data itself.
.rename()
MethodThe most common way to use .rename()
is by passing a dictionary to its columns
parameter. This dictionary should map the old column names (the keys) to the new column names (the values).
Let's start with a sample DataFrame:
import pandas as pd
import numpy as np
# Sample DataFrame with less-than-ideal column names
data = {'Student ID': [101, 102, 103, 104],
'Test Score (Math)': [85, 92, np.nan, 78],
'Test Score (English)': [76, 88, 95, 80],
'attendance %': [90, 95, 85, 92]}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
This will output:
Original DataFrame:
Student ID Test Score (Math) Test Score (English) attendance %
0 101 85.0 76 90
1 102 92.0 88 95
2 103 NaN 95 85
3 104 78.0 80 92
Notice the column names contain spaces, parentheses, and symbols like '%'. Let's rename them to be more programming-friendly, using lowercase and underscores.
# Define the renaming map
rename_map = {
'Student ID': 'student_id',
'Test Score (Math)': 'math_score',
'Test Score (English)': 'english_score',
'attendance %': 'attendance_pct'
}
# Use .rename() with the columns parameter
df_renamed = df.rename(columns=rename_map)
print("\nDataFrame after renaming:")
print(df_renamed)
The output shows the new column names:
DataFrame after renaming:
student_id math_score english_score attendance_pct
0 101 85.0 76 90
1 102 92.0 88 95
2 103 NaN 95 85
3 104 78.0 80 92
You don't have to rename all columns at once. If you only provide a mapping for a subset of columns, only those columns will be renamed.
By default, .rename()
returns a new DataFrame with the updated names, leaving the original DataFrame unchanged. This is generally safer as it prevents accidental modification of your original data.
However, if you are certain you want to modify the DataFrame directly, you can use the inplace=True
argument:
# Create a copy to demonstrate inplace modification
df_copy = df.copy()
print("\nOriginal DataFrame (copy):")
print(df_copy)
# Rename columns inplace
df_copy.rename(columns=rename_map, inplace=True)
print("\nDataFrame after inplace renaming:")
print(df_copy)
# Note: df_copy is now modified, df_renamed was created as a new object earlier
Output:
Original DataFrame (copy):
Student ID Test Score (Math) Test Score (English) attendance %
0 101 85.0 76 90
1 102 92.0 88 95
2 103 NaN 95 85
3 104 78.0 80 92
DataFrame after inplace renaming:
student_id math_score english_score attendance_pct
0 101 85.0 76 90
1 102 92.0 88 95
2 103 NaN 95 85
3 104 78.0 80 92
Using inplace=True
can sometimes make code slightly shorter, but use it with caution. It's often harder to track changes when objects are modified directly, especially in longer analysis scripts or notebooks.
The .rename()
method can also be used to rename index labels using the index
parameter, which works similarly to the columns
parameter, accepting a dictionary mapping old index labels to new ones.
# Example with index renaming (assuming df_renamed from before)
# Let's set student_id as the index first
df_indexed = df_renamed.set_index('student_id')
print("\nDataFrame with student_id as index:")
print(df_indexed)
# Rename specific index labels
index_rename_map = {101: 'S101', 104: 'S104'}
df_index_renamed = df_indexed.rename(index=index_rename_map)
print("\nDataFrame after renaming index labels:")
print(df_index_renamed)
Output:
DataFrame with student_id as index:
math_score english_score attendance_pct
student_id
101 85.0 76 90
102 92.0 88 95
103 NaN 95 85
104 78.0 80 92
DataFrame after renaming index labels:
math_score english_score attendance_pct
student_id
S101 85.0 76 90
102 92.0 88 95
103 NaN 95 85
S104 78.0 80 92
df.columns
If you need to rename all columns and you know their new names in the correct order, you can directly assign a list of new names to the DataFrame's .columns
attribute.
# Make sure the list length matches the number of columns
new_column_names = ['id', 'score_math', 'score_english', 'attendance']
# Create another copy to demonstrate this method
df_copy2 = df.copy()
# Assign the new list to df.columns
df_copy2.columns = new_column_names
print("\nDataFrame after assigning to df.columns:")
print(df_copy2)
Output:
DataFrame after assigning to df.columns:
id score_math score_english attendance
0 101 85.0 76 90
1 102 92.0 88 95
2 103 NaN 95 85
3 104 78.0 80 92
This method is more direct but less flexible than .rename()
. You must provide names for all columns, and the list length must exactly match the number of columns in the DataFrame, otherwise, you'll get an error. It's generally better suited for situations where you are creating a DataFrame or doing a complete overhaul of the column names. For targeted renaming, .rename()
is usually the preferred approach.
Renaming columns is a simple yet effective way to improve the clarity and usability of your DataFrames, making subsequent analysis steps smoother and your code easier to read.
© 2025 ApX Machine Learning