+ 1
Python * metacharacter
What's the point of using asterisk (*) metacharacter, when it needs "zero or more repetitions" to return True? Is it just always returning True? For example: re.match(r"(text)*","spam") will return True regardless of the second argument.
2 odpowiedzi
+ 2
A regular expression describes the type of string it can match.
(text)* will match an empty string since that is 0 repetitions of 'text'.
(text)* will also match 'text', 'texttext', 'texttexttext'...
The match function looks for the a match of the expression in the specified string. If it can't find a non-empty match and the regex can match an empty string, a match for an empty string is returned.
Run this script to see what I mean:
import re
rexpr = r"(text)*"
cases = ['', 'text', 'texttext', 'texttexttext', 'ttext', 'hello', 'world', 'hello text', 'text texttext']
for case in cases:
print('Calling match for: ' + case)
print(re.match(rexpr, case))
With your expression, match looks for any substring anywhere but you could use characters like ^ and $ to find matches strictly at the beginning, ending, or spanning the entire string.
This must find a match at the beginning of the string:
r"^(text)*"
For "ttext", the only match for r"^(text)*" is the empty string since the "text" is only after the pattern fails with the starting "tt".
r"^(text)*quot;
Now, a match can be an empty string or any repetitions of text but if even a single character precedes or is after the reptitions of "text", there won't be a match.
"" would match. "t" will not match at all. "text" will match but "ttext" will not.
The best way to learn regular expressions is by practicing scripts like I shared here with many cases. Another thing to help is if you learn how finite state machines can be converted to and from regular expressions.
0
Josh Greig well, now it makes sense. It should be explained more precisely in the course...