+ 5
Reading and processing Excel data
Hello SoloLearners, I need to process a very large Excel file (multiple data types). Do you have any suggestions on how to efficiently read and process this type of file. I can easily change the file to .csv if it helps. Any language will be fine (C++, Java or Python are preferred). Please share any code you have used and worked well for you. Thanks
9 Antworten
+ 10
I've played around with it and the best by far is xlwings.
www.xlwings.org/
try it first.
https://www.xlwings.org/examples
+ 9
@Red Hawks, can you please give more details on what is actually performed by the word "process" here, what needs to be processed, and what kind of output were expected from the "process"?.
It would also help if you specify the Ms Office version, as Microsoft had changed its file format since Office 2007 IIRC. They adopted OpenOffice's file format since, but I guess you already know that : )
+ 8
Thanks @Ipang for offering to help. The Excel file contains both string data and numerical data. We want to run some basic statistics such as counting all examples that meet various criteria (e.g. gender). As of now we are planning to convert the Excel file into CSV. Cheers!
+ 7
Thank you @Louis, one of my friends is planning to use Python for this program and your link looks interesting.
+ 6
Thanks @Kinshuk I will give it a try.
+ 5
Thanks for the info @John
+ 4
This is the start of Microsoft's Excel binary format. Years ago I did the same thing for Word and ran into bumbs where the documentation was hard to figure out.
https://msdn.microsoft.com/en-us/library/office/gg615597(v=office.14).aspx
+ 4
If you choose to do binary, plan on making tons of files using one feature so you know what is coming. Write a program to dump the file in hexadecimal so you can see the records to compare with the documentation before you code anything. It will make your job much easier.
+ 2
https://code.sololearn.com/c8mbun90l7HR/?ref=app
I made this for someone else, but it didn't help much.
This version is currently for reading integers only, but if you want to read strings, you can simply use a class, and a loop is all that you need to read a string, and split it accordingly.
I am working on the class version, and so will post it soon.