0
Pdf file download from web
I wanted to know is there any way in which I can download all the pdf files in the website by running a python code .If yes please tell me how to do it
5 Answers
+ 3
You want to just download them as files or parse and extract the text in them?
+ 3
The approach I'd suggest is first to retrieve the contents of the remote directory, filter out only *.pdf files and then download them one by one by looping through the list.
Two things:
1. This must be a public, non-restricted repository - otherwise the script will most likely get blocked/banned.
2. I have to check if a simple read/write will do for .pdf files.
I'll get back to it this evening, so look out :)
+ 3
Pls let me know if this works for you (won't go in Sololearn):
https://code.sololearn.com/cezlCwkWIF0E/?ref=app
It's not perfect and does not catch exceotions, but under the tested path inside the code - it works for me! đ
+ 2
I'll get back to you in an hour or so... đ
0
download as a file