0

What are the best python libraries for web scraping and web crawling?

There are numerous libraries present but best of all BEAUTIFUL SOUP & SCRAPPY Scrappy is a fully fledged FRAMEWORK which allows us to write small amounts of python code to create a spider[web crawling an automated bot which can crawl web pages and scrape them] Beautiful Soup is a LIBRARY which allows a programmer to get specific elements out of a webpage (for example, a list of images).As such, BeautifulSoup alone is not enough because you have to actually get the webpage in the first place and this leads us to use something like requests or urllib2, lxml to do that part. These tools operate kind-of like a web browser and retrieve pages off the internet so that BeautifulSoup has lot more work to do than scrappy. In addition to this comparing both the libraries, Scrappy documentation is easy to understand for beginners

17th Feb 2018, 9:25 AM
Jithendhar jith
Jithendhar jith - avatar
1 Odpowiedź
+ 7
Good summary. In the past I've found that some things are easier if you use Selenium webdriver, its worth a look http://www.seleniumhq.org/docs/03_webdriver.jsp
17th Feb 2018, 9:34 AM
Louis
Louis - avatar