+ 1

How to search list of Text in Pdf using Python? PyPDF2 library was not extracting text as expected, requesting other suggestions

I used PyPDF2 and it didn't extract text as expected, when the pdf has rich graphics and more number of pages. I request some other ideas to get rid of this issue.

4th Dec 2020, 5:57 AM
Gowtham rajasekher
Gowtham rajasekher - avatar
2 ответов
+ 9
Hye Gowtham rajasekher Why don't you use textract? - http://textract.readthedocs.io/en/latest/ https://github.com/deanmalmgren/textract It supports many types of files including PDFs.. Example - import textract text = textract.process("path/to/file.extension") Hope helps✌️
4th Dec 2020, 7:02 AM
Piyush
Piyush - avatar
0
Thanks for the suggestion I will try it and let you know... At now, I used a package called PDF Miner which extracted the text well for large number of pages and even if it is rich in graphics.
9th Dec 2020, 6:05 PM
Gowtham rajasekher
Gowtham rajasekher - avatar