+ 1

Regex findall

I need help making this simple piece of code work. In the code there are 3 images and regex keeps finding 2. I know something is wrong with the pattern. I have tried ^ and $. It doesn't work. Help. Thanks https://code.sololearn.com/cd22U2AyX7ix/?ref=app

17th Mar 2021, 10:11 AM
Tomiwa Joseph
Tomiwa Joseph - avatar
6 Réponses
+ 4
you must make your '.*' non greedy by appending '?' in your regex: pattern = re.compile(r'<img.*?/>')
17th Mar 2021, 11:01 AM
visph
visph - avatar
+ 2
to be able to find multi-lines img, you should replace '.' by '[\s\S]': pattern = re.compile(r'<img[\s\S]*?/>')
17th Mar 2021, 12:06 PM
visph
visph - avatar
+ 1
import re text = """ <p><img alt="" src="someurl.com" /></p> <p><img alt="" src="someotherurl.com" /><img alt="" src="anotherurl.com" /></p> """ pattern = re.compile(r'<img.*?\n?/>') all_of_em = re.findall(pattern, text) print(all_of_em) print(len(all_of_em)) """ what is causing the unwanted output is that the last match starts in 2nd line and ends in 3rd line, so you have to take in consideration the new line character in your pattern. the interrogation mark or ? means with or without the previous character which means match with or without any character except new line represented in the dot sign and with or without new line represented in \n sign. I hope you got the point. """
17th Mar 2021, 12:01 PM
iTech
iTech - avatar
+ 1
Of course not iTech Thanks guys.
17th Mar 2021, 4:54 PM
Tomiwa Joseph
Tomiwa Joseph - avatar
+ 1
I realise this thread is finished and you have your answer, but I just thought I'd add another solution, because it's quite useful to know. pattern = re.compile(r'<img[^>]*>') ...works because [^>] will match any character that isn't the tag-closing ">" character. Using this, you can't match more than one img tag in one go.
27th Mar 2021, 4:21 PM
Russ
Russ - avatar