+ 3

Requests-html vs. Selenium

So I'm just getting into web scraping and began my journey using BeaitifulSoup4. While I enjoy it's simplicity, it's very limited when attempting to scrape data from dynamic sites that use a lot of JavaScript. My next step was going to be to start learning the Selenium framework but then I discovered the requests-html library. I couldn't help but be attracted to it's bs4-like simplicity while being able to render JavaScript to extract data from it. My question is: Can requests-html completely replace the need for using selenium, or does it have limitations that still lead to the need of using selenium for some projects? P.S. I'm aware of the scrapy framework but for now want to get a solid grip on the basics of scraping before I dive into the more advanced world and functionality of scrapy.

26th Jan 2021, 7:09 PM
Reacy.Py
Reacy.Py - avatar
7 Respuestas
+ 4
Fatunmbi Teniola I am not talking about the request library, I already use that with beautifulsoup... Requests-html is a different library created by the same person that created the requests library... Requests-html is a standalone parser that does not need the help of beautifulsoup with its biggest difference being that it can render and extract data from JavaScript... Which is why I was comparing it with selenium...
4th Feb 2021, 5:30 PM
Reacy.Py
Reacy.Py - avatar
+ 3
Thanks again Fatunmbi Teniola! You helped me discover a big difference between the 2 so now I have a better idea of which tool is best for certain projects...
4th Feb 2021, 5:51 PM
Reacy.Py
Reacy.Py - avatar
+ 2
Fatunmbi Teniola So essentially requests-html can handle any project that does not require any interaction with the page to render the JavaScript and selenium will be useful for scraping pages that require some kind of interaction to render the JavaScript. And I'm not trying to just "just choose one to work with", the root of my question is to discover the difference(s) between them so that I can choose the right one for the job at hand. It's necessary to know the difference to know which is right for the job... Thanks for the help, I appreciate it!
4th Feb 2021, 5:17 PM
Reacy.Py
Reacy.Py - avatar
+ 1
Technically, requests can do a lot but not automation. Selenium can help with automating tasks like login, click and such which requests can't handle. Rather than choosing one to work with, pick the one that is best for solving the problem at hand. Hope this helps
4th Feb 2021, 12:37 PM
Fatunmbi Teniola
+ 1
Exactly, anything that has to do with interact with the web browser, JavaScript, and automation, selenium is your go-to library. Requests is used to download/get information from web page. When used with libraries like beautifulSoup, you can extract certain part of a web page rather than the whole page.
4th Feb 2021, 5:23 PM
Fatunmbi Teniola
+ 1
Okay. If the requests-html will solve your problem and you feel comfier with it, you can opt-in for it. But remember, solving problems is what we do, and that's what these tools are for.
4th Feb 2021, 5:37 PM
Fatunmbi Teniola
+ 1
Anytime bro Reacy.Py
4th Feb 2021, 8:46 PM
Fatunmbi Teniola