0

How can I insert values from pdf to sql

12th Nov 2017, 5:33 PM
Rahid
4 Réponses
+ 11
That's a very broad question -- the most important problem here is parsing the pdf file. Do you know the format of it? Is it constant? Is it a scan or a text file of some structure? What structure? Having answered all those questions above only then can you move on to putting the values to SQL, which, compared to the text mining task, will be a 🍰
12th Nov 2017, 5:39 PM
Kuba Siekierzyński
Kuba Siekierzyński - avatar
+ 7
Oh, this *matters a lot* what kind of format the file is... Pandas module contains powerful methods for reading txt files. But only if you are sure they are correct and well structured. For advanced parsing of PDF files you will need an external module like PDFMiner or PyPDF.
12th Nov 2017, 6:37 PM
Kuba Siekierzyński
Kuba Siekierzyński - avatar
+ 3
If it's a one time task, I usually load the text file to SublimeText, using Search and Replace with regular expressions, turn it into a bunch of insert statements, and run it as a script. If this job must be done repeatedly by the users, I usually write a GUI program.
12th Nov 2017, 6:14 PM
deFault
0
It's a text file not scan. It doesn't matter insert from PDF or *.docx It also can be a *.txt file.
12th Nov 2017, 5:50 PM
Rahid