0

Handling a Time Out Exceptions when Web Scraping

besides a recursive exception handler, eg. def initialScrape(link): try: scrape(link) except: return initialScrape(link) is there a better way to handle or even prevent timeout errors during scraping with requests-html? can headers improve the process, if so how and how do you include headers when working with requests-html?

16th Aug 2022, 3:32 PM
sonofpeter.exe
sonofpeter.exe - avatar
4 odpowiedzi
+ 1
There is no way to 100% prevent timeout error, for example you cannot do anything with server down. You may have several module use in scraping, you may prioritize them. Or simply just pass it with a timeout threshold, make log and goto the next HTML.
16th Aug 2022, 3:46 PM
abpatrick catkilltsoi
abpatrick catkilltsoi - avatar
+ 1
Yes. You can check it manually to see if there are any issues such as page relocation, rename, server closure, HTML restructure. Web scraping plugin is not god, they are just simple logic downloader macro…
16th Aug 2022, 3:54 PM
abpatrick catkilltsoi
abpatrick catkilltsoi - avatar
0
abpatrick catkilltsoi so make a log and retry the failed links later?
16th Aug 2022, 3:50 PM
sonofpeter.exe
sonofpeter.exe - avatar
0
abpatrick catkilltsoi plugins that tells you why the page timed out? now that'd be useful . 🤔 could you suggest any i could use with requests-html?
16th Aug 2022, 4:04 PM
sonofpeter.exe
sonofpeter.exe - avatar