Sometimes, the linking information for combining DataFrames is located in their indexes, rather than in common column values. Pandas offers flexible methods to handle these scenarios, utilizing both the pd.merge function and the specialized .join method.
Imagine you have two datasets: one containing employee details indexed by an employee ID, and another containing performance review scores, also indexed by the same employee ID. To combine these, you'd want to align them based on their indices.
pd.merge with Index FlagsThe pd.merge function accommodates index-based merging through the left_index and right_index boolean arguments.
left_index=True: Use the index of the left DataFrame as its join key(s).right_index=True: Use the index of the right DataFrame as its join key(s).Let's create two simple DataFrames to illustrate:
import pandas as pd
# DataFrame 1: Employee information indexed by Employee ID
employees = pd.DataFrame({'name': ['Alice', 'Bob', 'Charlie'],
'department': ['HR', 'Engineering', 'Sales']},
index=['E101', 'E102', 'E103'])
# DataFrame 2: Salary information indexed by Employee ID
salaries = pd.DataFrame({'salary': [70000, 85000, 78000]},
index=['E101', 'E102', 'E103'])
print("Employees DataFrame:")
print(employees)
print("\nSalaries DataFrame:")
print(salaries)
Output:
Employees DataFrame:
name department
E101 Alice HR
E102 Bob Engineering
E103 Charlie Sales
Salaries DataFrame:
salary
E101 70000
E102 85000
E103 78000
To merge these based on their common index (Employee ID), we set both left_index and right_index to True:
# Merge using the index of both DataFrames
employee_data = pd.merge(employees, salaries, left_index=True, right_index=True)
print("\nMerged DataFrame (using index):")
print(employee_data)
Output:
Merged DataFrame (using index):
name department salary
E101 Alice HR 70000
E102 Bob Engineering 85000
E103 Charlie Sales 78000
The merge correctly aligns rows based on the shared index values ('E101', 'E102', 'E103').
You can also mix index and column keys. Suppose the salary information used a column named emp_id instead of the index:
# Salary information with an 'emp_id' column
salaries_col = pd.DataFrame({'emp_id': ['E101', 'E102', 'E104'], # Note: E104 instead of E103
'salary': [70000, 85000, 92000]})
print("\nSalaries DataFrame (with column):")
print(salaries_col)
# Merge employees (using index) with salaries_col (using 'emp_id' column)
employee_data_mixed = pd.merge(employees, salaries_col,
left_index=True, right_on='emp_id',
how='left') # Using left join to keep all employees
print("\nMerged DataFrame (mixed index/column, left join):")
print(employee_data_mixed)
Output:
Salaries DataFrame (with column):
emp_id salary
0 E101 70000
1 E102 85000
2 E104 92000
Merged DataFrame (mixed index/column, left join):
name department emp_id salary
E101 Alice HR E101 70000.0
E102 Bob Engineering E102 85000.0
E103 Charlie Sales NaN NaN
Here, left_index=True tells merge to use the index from the employees DataFrame, while right_on='emp_id' specifies that the emp_id column from salaries_col should be used as the key for the right DataFrame. We used a how='left' join to keep all employees from the left DataFrame (employees); notice that 'Charlie' (E103) has NaN values for emp_id and salary because 'E103' was not present in the salaries_col's emp_id column. Similarly, 'E104' from salaries_col was dropped because it didn't match any index in employees during the left join.
.join() Method for Index-Based MergesPandas DataFrames have a convenience method, .join(), specifically designed for merging based on indices. It performs a left join by default but can be configured for other join types using the how argument.
Its syntax is often more concise for index merges: left_df.join(right_df, how='...')
Let's revisit the first example using .join():
# Using .join() for index-on-index merge (defaults to left join)
employee_data_join = employees.join(salaries)
print("\nMerged DataFrame (using .join()):")
print(employee_data_join)
Output:
Merged DataFrame (using .join()):
name department salary
E101 Alice HR 70000
E102 Bob Engineering 85000
E103 Charlie Sales 78000
This produces the same result as the pd.merge call with left_index=True and right_index=True (using the default inner join implicitly in the first merge example, or explicitly specifying how='left' which works identically here since indices match perfectly).
The .join() method can also join a DataFrame's index with a column in another DataFrame if you pass the column name to the on argument:
# Need to set index on salaries_col for this specific join example
salaries_col_indexed = salaries_col.set_index('emp_id')
print("\nSalaries DataFrame (indexed by emp_id):")
print(salaries_col_indexed)
# Joining employees (index) on salaries_col_indexed (index)
# This is equivalent to the first .join() example
employee_data_join_indexed = employees.join(salaries_col_indexed, how='inner') # Inner join
print("\nMerged DataFrame (using .join() on indices, inner):")
print(employee_data_join_indexed)
# Joining employees index with salaries_col 'emp_id' column
# Note: This specific use case is less common for .join() compared to pd.merge
# It requires the column 'emp_id' to exist in the *left* DataFrame (employees)
# Let's add it temporarily to show syntax (not a typical use)
employees_with_id = employees.reset_index().rename(columns={'index': 'emp_id'})
print("\nEmployees with emp_id column:")
print(employees_with_id)
# Need salaries *not* indexed for this join on column 'emp_id'
print("\nSalaries with emp_id column (original):")
print(salaries_col)
# Perform join using the 'emp_id' column in employees_with_id
# The right DataFrame (salaries_col) must be indexed appropriately
# for this to work as intended. Let's re-index salaries_col:
employee_data_join_on = employees_with_id.join(salaries_col.set_index('emp_id'), on='emp_id', how='left')
print("\nMerged DataFrame (using .join() with 'on'):")
print(employee_data_join_on)
Output:
Salaries DataFrame (indexed by emp_id):
salary
emp_id
E101 70000
E102 85000
E104 92000
Merged DataFrame (using .join() on indices, inner):
name department salary
E101 Alice HR 70000
E102 Bob Engineering 85000
Employees with emp_id column:
emp_id name department
0 E101 Alice HR
1 E102 Bob Engineering
2 E103 Charlie Sales
Salaries with emp_id column (original):
emp_id salary
0 E101 70000
1 E102 85000
2 E104 92000
Merged DataFrame (using .join() with 'on'):
emp_id name department salary
0 E101 Alice HR 70000.0
1 E102 Bob Engineering 85000.0
2 E103 Charlie Sales NaN
When to use pd.merge vs .join:
pd.merge() when you need the most flexibility: merging columns on columns, indices on indices, or mixing columns and indices. It's the general-purpose merging function..join() as a convenient shorthand primarily when merging based on indices (index of left DataFrame with index of right DataFrame). It defaults to a left join, which is common. While it can join on columns using the on keyword, pd.merge is often clearer for column-based or mixed merges.Mastering index-based merging is important for efficiently combining datasets where unique identifiers are stored as index labels, a common pattern in time-series data and other structured datasets.
Was this section helpful?
pd.merge and DataFrame.join methods with comprehensive examples.merge and join.© 2026 ApX Machine LearningEngineered with