0
Extract address from webpage by python
Need to fill the code which is for extracting addresses from a html web page addresses means land mark city pincode mobile/telephone
11 Answers
+ 1
use bs4 to scrape and regex to refine results
+ 1
bs4 is beautiful soup. Just the most up to date version i know.
regex and requests are both standard in python
there is a regex tutorial here on SL. But the requests and bs4 libraries you can receive help with a quick search.
from bs4 import BeautifulSoup
import requests
#basic request:
html = requests.get(<url>).text
#basic soup obj
html_soup = BeautifulSoup(html, "html.parser")
print(html_soup)
#try it out man
0
I only need to use bs4 and beautifulsoup
0
Can you help me sir with this
0
My teacher said only to import bs4 and pathlib
0
Can i have your contact no or email id
0
I will send you the code
0
I couldn't help out with the pathlib. I'd be learning right along side you haha. I'd just take the oppertunity and get aquainted with pathlib first if i were you (i assume you'll be using that in place of requests). But after you learn the basics of pathlib and you're able to extract html, just pass it over to a BeautifulSoup obj as shown above.
0
What code will you write to extract the addresses only
0
I mean the approach to extract them , addresses means landmark city state pin phone no
0
I don't know, never learned pathlib. But the approach can go several ways.
in bs4, when fed html, it can parse through and extract specific data, all through tags and other html identifiers. thats how id get addresses.
phone numbers is simple, make a phone number regex and extract every phone number in any text. If you search through my code bits you might find my phone number extractor. It scrapes any site and (cant remember how updated the one on here is) saves the results to a file. it may just print them to the screen