+ 1
Pdf to csv or excel with Python
It's possible to download a pdf from python, and transform the response to a csv or a excel file? I can parse the response and convert it in a pdf, but I would want to turn it to csv or excel without the step of converting it to pdf (even if it can be done with the pdf file is OK). The response when I ask python for the type says "bytes" Thanks☺️
6 Réponses
+ 3
Zalo203 If possible, making your PDF available via a download link could be useful for people willing to take a closer look.
+ 1
Pdf can contain lots of different types of content, images blocks of text, etc. It is really not straightforward to convert it to any spreadsheet format, if at all possible.
What format is your original data? If it can be captured in a pandas dataframe, then it is really easy to export to excel or csv with built-in functions of pandas.
+ 1
I am trying to do that, transform the data inside the pdf into a data frame to manipulate it with pandas, but I can't find a way to do it, the pdf in the mayority is text. I tried to use an online converter and it worked quite well, but I'm trying to do it without having to upload all pdfs and then download the converted file.
Thanks for answering ☺️
+ 1
Ok so does the PDF contains some sort of table? If you have to do this repeatedly, does the pdf consistently have the same structure? Same amount of table rows / columns even?
In any case you will need to find the right library that can process the pdf. I would start looking here:
https://realpython.com/pdf-JUMP_LINK__&&__python__&&__JUMP_LINK/
+ 1
It hasn't tables exactly, but it can be interpreted like a kind of tables, so I can somehow filter it once the data is in excel ot similar format, I will take a look to the link.
So many thanks☺️
+ 1
Yes, but unfortunately I can't upload it, sorry. A method for a standard pdfs without tables or images would be useful.
Thanks for the help.