+ 1
How to filter and modify Beautiful soup results "python" , before or after write it in .txt/.scv
Hi .. i made a script to scrap data from web page with soup tool in python.. and i found that the code work fine but with some <div codes appears !! i want to filter this line or edit it by delete or replace .. i tried a lot of .replace code without success !! any help
4 Réponses
+ 3
Beautifulsoup lets you parse HTML document, so you can dig in the tree down to the tags' "clean" values. Just open the source HTML and observe the structure.
You shouldn't use .replace unless you operate on the values themselves or the HTML is garbled beyond comprehension.
If you have problems, please share the code you wrote, so we can check it out.
+ 1
Can you give the code? We can't help you without the code.
+ 1
ok i faced two problems
first this one and second is i want to import links from csv file to my code in for loop
if you can help this is the code
quote_page = ["my link"]
data = []
for pg in quote_page:
page = urlopen(pg)
soup = BeautifulSoup(page, "html.parser")
name_box = soup.find ("div", attrs={'id':'information'})
name=name_box.text.strip()
name="".join(re.split("\s+,name,flags=re.UNICODE))
sys.stdout.write(name)
+ 1
No 1.
You don't need to use attrs={'id':'information'}, only write {'id':'information'}.
No 2.
We also need the HTML structure to be parsed.