+ 1

I'm learning a bit of web scraping with Python, how, why and when to use regex and or bs4 to parse html pages

web scraping with Python

python web regex bs4 web-scraping

24th Oct 2017, 4:43 AM

Donald Chinhuru

6 Antworten

+ 12

BeautifulSoup4 or bs4 allows you to strip and decompose the website to its building elements, so it deals with the HTML from the structure side - if at any point you need to deal with, let's say, the sixth hyperlink in the second row, third column of the table preceeded with two <div> tags. RegEx or re allows you to parse text itself, looking for character-specific matches inside it. So it finds even the most complex character combinations in the HTML file, kind of treating it like a text file. For most parsing tasks you probably need both :)

24th Oct 2017, 5:41 AM

Kuba Siekierzyński

+ 8

@Sayan I'd rather treat it as a nice way to behave ;)

24th Oct 2017, 3:12 PM

Kuba Siekierzyński

+ 7

No problem. Would you mind marking my answer as best in this case? :) It actually helps keeping my xp on the rise ;)

24th Oct 2017, 2:37 PM

Kuba Siekierzyński

+ 1

thanks you Kuba..your answer have been helpful https://www.sololearn.com/discuss/811313/?ref=app

24th Oct 2017, 2:21 PM

Donald Chinhuru

+ 1

😂😂😂@ kuba... that was hell of a cute demand

24th Oct 2017, 2:45 PM

sayan chandra

+ 1

😁😁😁 ok man i did

24th Oct 2017, 3:34 PM

Donald Chinhuru

Heute heiß

Hey I've done the C# and SQL beginner and intermediate, but still feel like there could be more... Is there advanced somewhere?

0 Votes

How to do a responsive page?

1 Votes

Running a python code

1 Votes

Can I make coding projects here and run them without sololearn pro, only in sololearn.

0 Votes

How create a new language ?

0 Votes

Is there any debugging practice here or not?

1 Votes

1 Votes

0 Votes

How To Enable Disable Divs?

0 Votes

Beginner question

0 Votes