One of the most straightforward ways to combine DataFrames is by concatenation. Think of it like stacking pieces of paper (your DataFrames) either one on top of the other (appending rows) or side-by-side (adding columns). Pandas provides the pd.concat() function for this purpose.

The pd.concat() function takes a list or sequence of Series or DataFrame objects as its primary argument and intelligently sticks them together.

Concatenating Along Rows (Stacking Vertically)

By default, pd.concat() stacks DataFrames vertically, meaning it appends rows from the second DataFrame to the end of the first, the third to the end of the second, and so on. This happens along axis=0.

Let's create two simple DataFrames to illustrate:

import pandas as pd

# Create the first DataFrame
df1 = pd.DataFrame({'A': ['A0', 'A1'],
                    'B': ['B0', 'B1']},
                   index=[0, 1])

# Create the second DataFrame
df2 = pd.DataFrame({'A': ['A2', 'A3'],
                    'B': ['B2', 'B3']},
                   index=[2, 3])

print("DataFrame 1:")
print(df1)
print("\nDataFrame 2:")
print(df2)

# Concatenate along rows (default behavior)
result_rows = pd.concat([df1, df2])

print("\nConcatenated DataFrame (Rows):")
print(result_rows)

Output:

DataFrame 1:
    A   B
0  A0  B0
1  A1  B1

DataFrame 2:
    A   B
2  A2  B2
3  A3  B3

Concatenated DataFrame (Rows):
    A   B
0  A0  B0
1  A1  B1
2  A2  B2
3  A3  B3

As you can see, df2's rows were appended directly below df1's rows. The columns were aligned automatically because both DataFrames had the same column names ('A' and 'B').

Handling Indices During Row Concatenation

Notice the index in the result_rows DataFrame: [0, 1, 2, 3]. In this case, the original indices were unique and combined nicely. However, what if the original DataFrames had overlapping index labels?

Let's modify df2 to have overlapping indices with df1:

# Create df1 again
df1 = pd.DataFrame({'A': ['A0', 'A1'],
                    'B': ['B0', 'B1']},
                   index=[0, 1])

# Create df2 with overlapping index
df2_overlap = pd.DataFrame({'A': ['A2', 'A3'],
                            'B': ['B2', 'B3']},
                           index=[1, 2]) # Index 1 overlaps with df1

print("DataFrame 1:")
print(df1)
print("\nDataFrame 2 (Overlapping Index):")
print(df2_overlap)

# Concatenate with overlapping indices
result_overlap = pd.concat([df1, df2_overlap])

print("\nConcatenated (Overlapping Index):")
print(result_overlap)
print("\nIndex of the result:")
print(result_overlap.index)

Output:

DataFrame 1:
    A   B
0  A0  B0
1  A1  B1

DataFrame 2 (Overlapping Index):
    A   B
1  A2  B2
2  A3  B3

Concatenated (Overlapping Index):
    A   B
0  A0  B0
1  A1  B1
1  A2  B2  # Duplicate index label 1
2  A3  B3

Index of the result:
Index([0, 1, 1, 2], dtype='int64')

By default, pd.concat() preserves the original indices, even if this results in duplicates (like index 1 in the example above). While Pandas allows duplicate indices, they can sometimes make selecting data by label (.loc) ambiguous or lead to unexpected behavior in later operations.

If you don't need to preserve the original indices and prefer a clean, unique index for the resulting DataFrame, you can use the ignore_index=True argument:

# Concatenate with overlapping indices, ignoring original index
result_ignore_index = pd.concat([df1, df2_overlap], ignore_index=True)

print("\nConcatenated (Ignoring Index):")
print(result_ignore_index)
print("\nIndex of the result (Ignored):")
print(result_ignore_index.index)

Output:

Concatenated (Ignoring Index):
    A   B
0  A0  B0
1  A1  B1
2  A2  B2
3  A3  B3

Index of the result (Ignored):
RangeIndex(start=0, stop=4, step=1)

Setting ignore_index=True discards the original indices and assigns a new default integer index (a RangeIndex) to the concatenated DataFrame.

Concatenating Along Columns (Stacking Horizontally)

You can also concatenate DataFrames side-by-side by specifying axis=1. This aligns DataFrames based on their index labels and places their columns next to each other.

Let's create two DataFrames with the same index but different columns:

# Create df1 again
df1 = pd.DataFrame({'A': ['A0', 'A1'],
                    'B': ['B0', 'B1']},
                   index=[0, 1])

# Create df3 with different columns but same index
df3 = pd.DataFrame({'C': ['C0', 'C1'],
                    'D': ['D0', 'D1']},
                   index=[0, 1])

print("DataFrame 1:")
print(df1)
print("\nDataFrame 3:")
print(df3)

# Concatenate along columns (axis=1)
result_cols = pd.concat([df1, df3], axis=1)

print("\nConcatenated DataFrame (Columns):")
print(result_cols)

Output:

DataFrame 1:
    A   B
0  A0  B0
1  A1  B1

DataFrame 3:
    C   D
0  C0  D0
1  C1  D1

Concatenated DataFrame (Columns):
    A   B   C   D
0  A0  B0  C0  D0
1  A1  B1  C1  D1

Here, pd.concat used the index (0 and 1) to align the rows and placed the columns of df3 next to the columns of df1.

Below is a diagram illustrating the difference between concatenating along axis=0 (rows) and axis=1 (columns).

Diagram illustrating vertical (axis=0) and horizontal (axis=1) concatenation.

Handling Alignment and Missing Data (Join Logic)

What happens when concatenating DataFrames that don't perfectly align on the axis not being concatenated? For example, concatenating side-by-side (axis=1) when the DataFrames have different row indices, or stacking vertically (axis=0) when they have different columns.

This is controlled by the join parameter, which works similarly to database joins:

join='outer' (Default): Takes the union of the indices (or columns). It keeps all labels from both DataFrames. If a label exists in one DataFrame but not the other, the missing values are filled with NaN (Not a Number).
join='inner': Takes the intersection of the indices (or columns). It keeps only the labels that are present in both DataFrames.

Let's see this with column concatenation (axis=1) where the DataFrames have different indices:

# df1 has index [0, 1]
print("DataFrame 1:")
print(df1)

# Create df4 with different index [1, 2]
df4 = pd.DataFrame({'C': ['C1', 'C2'],
                    'D': ['D1', 'D2']},
                   index=[1, 2])
print("\nDataFrame 4:")
print(df4)

# Concatenate columns with outer join (default)
result_outer = pd.concat([df1, df4], axis=1, join='outer') # or just pd.concat([df1, df4], axis=1)
print("\nConcatenated Columns (Outer Join):")
print(result_outer)

# Concatenate columns with inner join
result_inner = pd.concat([df1, df4], axis=1, join='inner')
print("\nConcatenated Columns (Inner Join):")
print(result_inner)

Output:

DataFrame 1:
    A   B
0  A0  B0
1  A1  B1

DataFrame 4:
    C   D
1  C1  D1
2  C2  D2

Concatenated Columns (Outer Join):
      A    B    C    D
0   A0   B0  NaN  NaN
1   A1   B1   C1   D1
2  NaN  NaN   C2   D2

Concatenated Columns (Inner Join):
    A   B   C   D
1  A1  B1  C1  D1

Outer Join: The resulting DataFrame has indices 0, 1, and 2 (the union of [0, 1] and [1, 2]). Since index 0 only exists in df1, the 'C' and 'D' columns have NaN for that row. Similarly, since index 2 only exists in df4, the 'A' and 'B' columns have NaN for that row. Index 1 exists in both, so all columns have values.
Inner Join: The resulting DataFrame only includes index 1, as it's the only index label present in both df1 and df4.

The same join logic applies when concatenating rows (axis=0) using DataFrames with different column names. An outer join would include all columns from both DataFrames, filling missing values with NaN, while an inner join would only keep columns common to both.

Concatenation using pd.concat is a fundamental tool for assembling larger datasets from smaller pieces. Understanding how to control the axis, handle indices, and manage alignment with joins is important for combining your data correctly.