+ 2
What does "r" mean? What are the difference between "+" and "*"?
import re pattern = r"g+" if re.match(pattern, "g"): print("Match 1") if re.match(pattern, "gggggggggggggg"): print("Match 2") if re.match(pattern, "abc"): print("Match 3")
2 Answers
+ 2
To avoid any confusion while working with regular expressions, we would use raw strings as r"expression".
Raw strings don't escape anything, which makes use of regular expressions easier.
The metacharacter * means "zero or more repetitions of the previous thing". It tries to match as many repetitions as possible. The "previous thing" can be a single character, a class, or a group of characters in parentheses.
The metacharacter + is very similar to *, except it means "one or more repetitions", as opposed to "zero or more repetitions".
The metacharacter ? means "zero or one repetitions".
Curly braces can be used to represent the number of repetitions between two numbers.
The regex {x,y} means "between x and y repetitions of something".
Hence {0,1} is the same thing as ?.
If the first number is missing, it is taken to be zero. If the second number is missing, it is taken to be infinity.
+ 2
The 'r' prefix tells Python not to look at the \ characters and not to replace them with special characters. Basically that means raw data.
From Python documentation: https://docs.python.org/3/library/re.html
Regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Pythonâs usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write '\\\\' as the pattern string, because the regular expression must be \\, and each backslash must be expressed as \\ inside a regular Python string literal.
The solution is to use Pythonâs raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'. So r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.