Just as you can load data from various file types, Pandas provides straightforward methods to save your DataFrames into different formats. While CSV files are excellent for simplicity and interoperability between systems, Microsoft Excel files (.xlsx
or the older .xls
) are very common in business settings and allow for features like multiple sheets within a single file.
Pandas makes writing DataFrames to Excel files simple using the to_excel()
method, which works analogously to the to_csv()
method we saw previously.
Let's start with a basic example. Suppose we have processed some data and stored it in a DataFrame:
import pandas as pd
# Sample DataFrame
data = {'Product': ['Widget A', 'Widget B', 'Gadget C', 'Widget A'],
'Region': ['North', 'South', 'North', 'West'],
'Sales': [150, 200, 120, 180],
'Profit': [30, 45, 25, 38]}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# Save the DataFrame to an Excel file
# This creates 'output_data.xlsx' in the same directory as your script/notebook
df.to_excel('output_data.xlsx')
print("\nDataFrame successfully saved to output_data.xlsx")
If you run this code and then open the generated output_data.xlsx
file using spreadsheet software like Microsoft Excel, LibreOffice Calc, or Google Sheets, you'll see your DataFrame represented as a table.
The to_excel()
method offers several parameters to customize the output file. Here are some of the most commonly used ones:
excel_writer
(File Path): This is the first argument and specifies the file path and name for the output Excel file (e.g., 'data/processed_sales.xlsx'
, 'report.xlsx'
).
sheet_name
: By default, the DataFrame is written to a sheet named 'Sheet1'. You can specify a different name using this parameter. For example, df.to_excel('report.xlsx', sheet_name='SalesData')
.
index
: Just like with to_csv()
, to_excel()
writes the DataFrame's index as the first column in the Excel sheet by default. Often, this index is just the default sequential integer index (0, 1, 2...) and isn't meaningful data you want in the final spreadsheet. To prevent writing the index, set index=False
:
# Save without the DataFrame index
df.to_excel('output_data_no_index.xlsx', index=False)
print("DataFrame saved to output_data_no_index.xlsx without the index column.")
Setting index=False
is a very common practice when exporting data for others who might not be familiar with Pandas indices.
columns
: If you only want to save specific columns from your DataFrame, you can provide a list of column names to the columns
parameter:
# Save only the 'Product' and 'Sales' columns
df.to_excel('output_subset.xlsx', columns=['Product', 'Sales'], index=False)
print("Saved only Product and Sales columns to output_subset.xlsx")
header
: This parameter controls whether the column names (the header row) are written to the file. It defaults to True
. Set header=False
if you don't want the header row included.
startrow
and startcol
: These allow you to specify the top-left cell where the DataFrame should be written within the sheet (0-indexed). For instance, startrow=1, startcol=2
would start writing the DataFrame (including its header, if applicable) at cell C2. This can be useful when adding data to existing sheets or creating more complex layouts.
A significant advantage of Excel files over simple CSVs is their ability to contain multiple worksheets. Pandas allows you to write several DataFrames to different sheets within the same .xlsx
file using the pd.ExcelWriter
object. This is typically done using a with
statement, which ensures the file is properly saved and closed.
Here's how you can do it:
import pandas as pd
# Create two sample DataFrames
df_sales = pd.DataFrame({
'Region': ['North', 'South', 'East', 'West'],
'Sales': [1000, 1500, 1200, 1800]
})
df_inventory = pd.DataFrame({
'Product ID': ['P101', 'P102', 'P103', 'P104'],
'Stock': [50, 75, 30, 90],
'Warehouse': ['WH-A', 'WH-B', 'WH-A', 'WH-C']
})
# Use ExcelWriter to save both DataFrames to different sheets in the same file
output_filename = 'multi_sheet_report.xlsx'
with pd.ExcelWriter(output_filename) as writer:
df_sales.to_excel(writer, sheet_name='Sales Summary', index=False)
df_inventory.to_excel(writer, sheet_name='Inventory Levels', index=False)
print(f"Both DataFrames saved to {output_filename}")
If you open multi_sheet_report.xlsx
, you will find two sheets named "Sales Summary" and "Inventory Levels", each containing the data from the respective DataFrame.
To write Excel files, Pandas relies on external libraries called "engines".
.xlsx
files (recommended), you typically need the openpyxl
library..xls
files, the xlwt
library was used.If you attempt to use to_excel()
and don't have the necessary engine installed, Pandas will usually raise an ImportError
with instructions on how to install it. You can install openpyxl
using pip:
pip install openpyxl
Or using conda:
conda install openpyxl
Saving DataFrames to Excel files using df.to_excel()
is a valuable skill for sharing your analysis results or processed data, especially in environments where spreadsheets are the standard tool for data viewing and manipulation. Remember the index=False
option for cleaner output and consider using pd.ExcelWriter
when you need to organize related data into multiple sheets within a single file. While Excel is convenient, keep in mind that for very large datasets, writing can be slower compared to formats like CSV or more specialized binary formats.
© 2025 ApX Machine Learning