+ 1
How to search list of Text in Pdf using Python? PyPDF2 library was not extracting text as expected, requesting other suggestions
I used PyPDF2 and it didn't extract text as expected, when the pdf has rich graphics and more number of pages. I request some other ideas to get rid of this issue.
2 ответов
+ 9
Hye Gowtham rajasekher
Why don't you use textract? -
http://textract.readthedocs.io/en/latest/
https://github.com/deanmalmgren/textract
It supports many types of files including PDFs..
Example -
import textract
text = textract.process("path/to/file.extension")
Hope helps✌️
0
Thanks for the suggestion I will try it and let you know...
At now, I used a package called PDF Miner which extracted the text well for large number of pages and even if it is rich in graphics.