0
Handling a Time Out Exceptions when Web Scraping
besides a recursive exception handler, eg. def initialScrape(link): try: scrape(link) except: return initialScrape(link) is there a better way to handle or even prevent timeout errors during scraping with requests-html? can headers improve the process, if so how and how do you include headers when working with requests-html?
4 Antworten
+ 1
There is no way to 100% prevent timeout error, for example you cannot do anything with server down. You may have several module use in scraping, you may prioritize them. Or simply just pass it with a timeout threshold, make log and goto the next HTML.
+ 1
Yes. You can check it manually to see if there are any issues such as page relocation, rename, server closure, HTML restructure. Web scraping plugin is not god, they are just simple logic downloader macro…
0
abpatrick catkilltsoi
so make a log and retry the failed links later?
0
abpatrick catkilltsoi
plugins that tells you why the page timed out? now that'd be useful . 🤔
could you suggest any i could use with requests-html?