0

Pdf file download from web

I wanted to know is there any way in which I can download all the pdf files in the website by running a python code .If yes please tell me how to do it

7th Apr 2017, 3:13 PM
Vijeth Belle
Vijeth Belle - avatar
5 Answers
+ 3
You want to just download them as files or parse and extract the text in them?
7th Apr 2017, 3:23 PM
Kuba SiekierzyƄski
Kuba SiekierzyƄski - avatar
+ 3
The approach I'd suggest is first to retrieve the contents of the remote directory, filter out only *.pdf files and then download them one by one by looping through the list. Two things: 1. This must be a public, non-restricted repository - otherwise the script will most likely get blocked/banned. 2. I have to check if a simple read/write will do for .pdf files. I'll get back to it this evening, so look out :)
7th Apr 2017, 3:39 PM
Kuba SiekierzyƄski
Kuba SiekierzyƄski - avatar
+ 3
Pls let me know if this works for you (won't go in Sololearn): https://code.sololearn.com/cezlCwkWIF0E/?ref=app It's not perfect and does not catch exceotions, but under the tested path inside the code - it works for me! 😎
7th Apr 2017, 6:38 PM
Kuba SiekierzyƄski
Kuba SiekierzyƄski - avatar
+ 2
I'll get back to you in an hour or so... 🐍
7th Apr 2017, 3:26 PM
Kuba SiekierzyƄski
Kuba SiekierzyƄski - avatar
0
download as a file
7th Apr 2017, 3:24 PM
Vijeth Belle
Vijeth Belle - avatar