0
extract know url, how?
example i have a text file like this: https://www.site.com/part1/part3/........... https://www.site.com/part1/part2/........... https://www.site.com/part1/part3/........... i want only extract like one this: https://www.site.com/part1/part3/........... not part2 how can i do this?đ§ thank you friends đ¸ i looking 2 days, i cant find anything :(
3 Answers
+ 5
This pattern should work for you:
pat= r'.*/part1/.*[^2]/.*'
m= re.search(pat, link)
if m: #then select
else: #reject
+ 1
You should be able to use regex, match the URLs against
"http://www.site.com/part1/part3/.*"
like, idk what you actually need cuz you just gave us this example site with very limited test cases to fulfill. Do you just want part3, or do you want everything except for part2? Does part2 only appear after part1? Do you have non-URLs in your text file?
+ 1
Thank you both. You gave me ideas. I figured it out.
urls = re.findall('https://site.com/part1/part3/.*/*.jpg', text_file)
for url in urls:
print(text_file, file=open("a.txt", "a"))