0

extract know url, how?

example i have a text file like this: https://www.site.com/part1/part3/........... https://www.site.com/part1/part2/........... https://www.site.com/part1/part3/........... i want only extract like one this: https://www.site.com/part1/part3/........... not part2 how can i do this?🧐 thank you friends 😸 i looking 2 days, i cant find anything :(

12th Mar 2019, 11:30 AM
Halil İbrahim Yalçın
Halil İbrahim Yalçın - avatar
3 Respostas
+ 5
This pattern should work for you: pat= r'.*/part1/.*[^2]/.*' m= re.search(pat, link) if m: #then select else: #reject
12th Mar 2019, 11:47 AM
Шащи Ранжан
Шащи Ранжан - avatar
+ 1
You should be able to use regex, match the URLs against "http://www.site.com/part1/part3/.*" like, idk what you actually need cuz you just gave us this example site with very limited test cases to fulfill. Do you just want part3, or do you want everything except for part2? Does part2 only appear after part1? Do you have non-URLs in your text file?
12th Mar 2019, 11:47 AM
Hatsy Rei
Hatsy Rei - avatar
+ 1
Thank you both. You gave me ideas. I figured it out. urls = re.findall('https://site.com/part1/part3/.*/*.jpg', text_file) for url in urls: print(text_file, file=open("a.txt", "a"))
12th Mar 2019, 1:02 PM
Halil İbrahim Yalçın
Halil İbrahim Yalçın - avatar