All Courses

Renaming Columns

As you clean and prepare your data, you'll often find that the original column names are not ideal. They might be unclear, too long, contain spaces or special characters that are awkward to work with in code, or simply not follow a consistent naming convention. Renaming columns is a frequent and important step in making your DataFrame easier to understand and use.

Pandas provides the flexible .rename() method specifically for this purpose. It allows you to change column names (and index labels) without altering the data itself.

Using the `.rename()` Method

The most common way to use .rename() is by passing a dictionary to its columns parameter. This dictionary should map the old column names (the keys) to the new column names (the values).

Let's start with a sample DataFrame:

import pandas as pd
import numpy as np

# Sample DataFrame with less-than-ideal column names
data = {'Student ID': [101, 102, 103, 104],
        'Test Score (Math)': [85, 92, np.nan, 78],
        'Test Score (English)': [76, 88, 95, 80],
        'attendance %': [90, 95, 85, 92]}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

This will output:

Original DataFrame:
   Student ID  Test Score (Math)  Test Score (English)  attendance %
0         101               85.0                    76            90
1         102               92.0                    88            95
2         103                NaN                    95            85
3         104               78.0                    80            92

Notice the column names contain spaces, parentheses, and symbols like '%'. Let's rename them to be more programming-friendly, using lowercase and underscores.

# Define the renaming map
rename_map = {
    'Student ID': 'student_id',
    'Test Score (Math)': 'math_score',
    'Test Score (English)': 'english_score',
    'attendance %': 'attendance_pct'
}

# Use .rename() with the columns parameter
df_renamed = df.rename(columns=rename_map)

print("\nDataFrame after renaming:")
print(df_renamed)

The output shows the new column names:

DataFrame after renaming:
   student_id  math_score  english_score  attendance_pct
0         101        85.0             76              90
1         102        92.0             88              95
2         103         NaN             95              85
3         104        78.0             80              92

You don't have to rename all columns at once. If you only provide a mapping for a subset of columns, only those columns will be renamed.

Modifying the DataFrame In Place

By default, .rename() returns a new DataFrame with the updated names, leaving the original DataFrame unchanged. This is generally safer as it prevents accidental modification of your original data.

However, if you are certain you want to modify the DataFrame directly, you can use the inplace=True argument:

# Create a copy to demonstrate inplace modification
df_copy = df.copy()

print("\nOriginal DataFrame (copy):")
print(df_copy)

# Rename columns inplace
df_copy.rename(columns=rename_map, inplace=True)

print("\nDataFrame after inplace renaming:")
print(df_copy)
# Note: df_copy is now modified, df_renamed was created as a new object earlier

Output:

Original DataFrame (copy):
   Student ID  Test Score (Math)  Test Score (English)  attendance %
0         101               85.0                    76            90
1         102               92.0                    88            95
2         103                NaN                    95            85
3         104               78.0                    80            92

DataFrame after inplace renaming:
   student_id  math_score  english_score  attendance_pct
0         101        85.0             76              90
1         102        92.0             88              95
2         103         NaN             95              85
3         104        78.0             80              92

Using inplace=True can sometimes make code slightly shorter, but use it with caution. It's often harder to track changes when objects are modified directly, especially in longer analysis scripts or notebooks.

Renaming Index Labels

The .rename() method can also be used to rename index labels using the index parameter, which works similarly to the columns parameter, accepting a dictionary mapping old index labels to new ones.

# Example with index renaming (assuming df_renamed from before)
# Let's set student_id as the index first
df_indexed = df_renamed.set_index('student_id')
print("\nDataFrame with student_id as index:")
print(df_indexed)

# Rename specific index labels
index_rename_map = {101: 'S101', 104: 'S104'}
df_index_renamed = df_indexed.rename(index=index_rename_map)

print("\nDataFrame after renaming index labels:")
print(df_index_renamed)

Output:

DataFrame with student_id as index:
            math_score  english_score  attendance_pct
student_id                                           
101               85.0             76              90
102               92.0             88              95
103                NaN             95              85
104               78.0             80              92

DataFrame after renaming index labels:
            math_score  english_score  attendance_pct
student_id                                           
S101              85.0             76              90
102               92.0             88              95
103                NaN             95              85
S104              78.0             80              92

Alternative: Assigning to `df.columns`

If you need to rename all columns and you know their new names in the correct order, you can directly assign a list of new names to the DataFrame's .columns attribute.

# Make sure the list length matches the number of columns
new_column_names = ['id', 'score_math', 'score_english', 'attendance']

# Create another copy to demonstrate this method
df_copy2 = df.copy()

# Assign the new list to df.columns
df_copy2.columns = new_column_names

print("\nDataFrame after assigning to df.columns:")
print(df_copy2)

Output:

DataFrame after assigning to df.columns:
    id  score_math  score_english  attendance
0  101        85.0             76          90
1  102        92.0             88          95
2  103         NaN             95          85
3  104        78.0             80          92

This method is more direct but less flexible than .rename(). You must provide names for all columns, and the list length must exactly match the number of columns in the DataFrame, otherwise, you'll get an error. It's generally better suited for situations where you are creating a DataFrame or doing a complete overhaul of the column names. For targeted renaming, .rename() is usually the preferred approach.

Renaming columns is a simple yet effective way to improve the clarity and usability of your DataFrames, making subsequent analysis steps smoother and your code easier to read.

Was this section helpful?

Renaming Columns

Using the .rename() Method

Modifying the DataFrame In Place

Renaming Index Labels

Alternative: Assigning to df.columns

Using the `.rename()` Method

Alternative: Assigning to `df.columns`