I need to export 24 pandas data frames ( 140 columns x 400 rows) to Excel, each into a different sheet.
I am using pandas’ built-in
ExcelWriter. Running 24 scenarios, it takes:
51 seconds to write to an
.xls file (using
86 seconds to write to an
.xlsx file (using
141 seconds to write to an
.xlsm file (using
21 seconds to just run the program (no Excel output)
The problem with writing to
.xls is that the spreadsheet contains no formatting styles, so if I open it in Excel, select a column, and click on the ‘comma’ button to format the numbers, it tells me: ‘style comma not found’. I don’t get this problem writing to an
.xlsx, but that’s even slower.
Any suggestions on how to make the exporting faster?
I can’t be the first one to have this problem, yet after hours of searching forums and websites I haven’t found any definite solution.
The only thing I can think of is to use Python to export to csv files, and then write an Excel macro to merge all the CSVs into a single spreadsheet.
.xls file is 10 MB, and the
.xlsx 5.2 MB
Here is a benchmark for different Python to Excel modules.
And here is the output for 140 columns x (400 x 24) rows using the latest version of the modules at the time of posting:
Versions: python : 2.7.7 openpyxl : 2.0.5 pyexcelerate: 0.6.3 xlsxwriter : 0.5.7 xlwt : 0.7.5 Dimensions: Rows = 9600 (400 x 24) Cols = 140 Times: pyexcelerate : 11.85 xlwt : 17.64 xlsxwriter (optimised): 21.63 xlsxwriter : 26.76 openpyxl (optimised): 95.18 openpyxl : 119.29
As with any benchmark the results will depend on Python/module versions, CPU, RAM and Disk I/O and on the benchmark itself. So make sure to verify these results for your own setup.
Also, since you asked specifically about Pandas, please note that PyExcelerate isn’t supported.