0
How to extract the text of all the pages of a PDF using pdfplumber?
Can anyone help me I need the source code
13 Respuestas
+ 1
import pdfplumber as pdfp
from gtts import gTTS
pdfToString = ""
with pdfp.open('/storage/emulated/0/Download/filename.pdf') as pdf:
for page in pdf.pages:
print(page.extract_text())
pdfToString += page.extract_text()
pdfToSpeech = gTTS(pdfToString, lang='de')
pdfToSpeech.save('/storage/emulated/0/Music/pdfToSpeech_deutsch.mp3')
This is what I got very quickly from the gtts documentation..
You can choose a Language via lang member of gtts (For me it is german - > 'de', english would be 'en'..
More Language - > See documentation
+ 1
Hi Ujjawal Gupta,
Try this:
import pdfplumber as pdfp
with pdfp.open('/storage/emulated/0/Download/filename.pdf') as pdf:
for page in pdf.pages:
print(page.extract_text())
For Sure you should adjust the path to the file, passed to open() method...
Hope this helps...
+ 1
Thank you buddy
You just saved my life!
+ 1
On my PC, pyttsx3 runs about 6 Times faster than gtts, although the speach of gtts is much nicer, sounds more natural.
0
Hey G B,
Can you please tell me that how can I convert this text into speech by using gtts module?
0
Hey bro,
Why gtts is so slow?
It takes too much time to execute
Is there any way to reduce the time?
0
Hi,
gtts uses the Internet. The processing takes place in the cloud.
So to speed this up, you May need a faster Internet access.
Alternatively you could try pyttsx3. This is an offline text to speach lib.
Unfortunately this does not work on Android.
Hers a little reference code
import pdfplumber as pdfp
from time import time
from gtts import gTTS
import pyttsx3
pdfToString = ""
with pdfp.open('/storage/emulated/0/Download/file.pdf') as pdf:
for page in pdf.pages:
print(page.extract_text())
pdfToString += page.extract_text()
print("starting gtts...")
start = time()
pdfToSpeech = gTTS(pdfToString)
pdfToSpeech.save('/storage/emulated/0/Music/pdfToSpeech.mp3')
stop = time()
print(f"gtts finished after {stop - start} seconds")
print("starting pyttsx3...")
start = time()
engine = pyttsx3.init()
engine.save_to_file(pdfToString, '/storage/emulated/0/Music/pdfToSpeech2.mp3')
engine.runAndWait()
stop = time()
print(f"pyttsx3 finished after {stop - start} seconds")
0
Yes you are right
0
But is there any way to reduce the runtime
0
Thank you so much
0
No Problem :)
I'm afraid, I don't know a Way to reduce the runtime, but using pyttsx3.
0
Is the internet is the only issue of the problem?
0
I think Yes. But on this, you only have limited impact. In the Internet, data always travels slower, than inside your PC or Smartphone. Thus, even with the fastest possibile internet access, i think gtts will never become as fast as pyttsx3.