extract know url, how?

Question

example i have a text file like this:

https://www.site.com/part1/part3/...........
https://www.site.com/part1/part2/...........
https://www.site.com/part1/part3/...........

i want only extract like one this:

https://www.site.com/part1/part3/...........
not part2

how can i do this?🧐 thank you friends 😸

i looking 2 days, i cant find anything :(

Answer

This pattern should work for you:

pat= r'.*/part1/.*[^2]/.*'

m= re.search(pat, link)

if m: #then select
else: #reject

Answer

You should be able to use regex, match the URLs against

"http://www.site.com/part1/part3/.*"

like, idk what you actually need cuz you just gave us this example site with very limited test cases to fulfill. Do you just want part3, or do you want everything except for part2? Does part2 only appear after part1? Do you have non-URLs in your text file?

Answer

Thank you both. You gave me ideas. I figured it out.

urls = re.findall('https://site.com/part1/part3/.*/*.jpg', text_file)

for url in urls:
  print(text_file, file=open("a.txt", "a"))

extract know url, how?

Часто задают такие вопросы?