0

Regex to match a currency string and return value without , chars

I am trying to write a regex that will match a currency string. The string is formatted like: $123,456.78 I want the returned result to be: 123456.78 but I want to achieve it using a named grouped construct in the regex, so I have developed this so far: \$(?P<ItemCost>(\d{0,3}),*(\d{0,3})(\.\d{2})) This creates a named capture group called ItemCost but it will return the value WITH the comma char , included in it like: 2,175.00 however I would like the result to be: 2175.00 Is it possible to achieve this WITHOUT splitting the result into seperate chunks? I'd really prefer if I could just have the single named capture group return the value with the , char removed. I can easily split the result each side of the , but I would prefer to not do that, and also extend the possibility to handle unlimited number of radix points in the input currency string, so it would handle values like: $123,456,789.00 (or larger!)

12th Apr 2024, 1:05 AM
Nathan Stanley
Nathan Stanley - avatar
7 Answers
+ 1
Yeah, let me try again.
14th Apr 2024, 12:00 PM
`ᴴᵗᵗየ
`ᴴᵗᵗየ - avatar
0
You can achieve this by using a regex pattern to match the currency string and then performing a substitution to remove the comma characters. Here's how you can do it: Regex Pattern: \$((?:\d{1,3},)*\d{1,3}(?:\.\d{2})?) Explanation: \ : Escapes the $ symbol. ((?:\d{1,3},)*\d{1,3} : Matches the integer part of the currency with optional commas. This pattern matches one to three digits followed by an optional comma, and this group can repeat zero or more times. This allows handling of currency values with unlimited radix points. (?:\.\d{2})? : Matches the decimal part of the currency (if present), allowing for two digits after the decimal point. Python code to perform the substitution: https://www.sololearn.com/fr/compiler-playground/cEEjsz6aKfIl This code uses re.sub() with a lambda function to replace the matched currency string with the same string, but with commas removed from the integer part. This allows for a single named capture group and handles unlimited radix points in the input.
12th Apr 2024, 1:20 AM
Abiye Gebresilassie Enzo Emmanuel
Abiye Gebresilassie Enzo Emmanuel - avatar
0
Hi Abiye Gebresilassie Enzo Emmanuel, thanks for your answer timestamped 12th Apr 2024, 1:20 AM I understand I can post-process the result to remove the radix points (comma characters) but I would like to know if I can attain the radix point removal WITHOUT post processing AND doing it in a named grouped construct solely within the regex system. I know I can easily remove the radix points using regex if I DON'T use a named group construct, but I would like to use a named group construct for this.
12th Apr 2024, 1:29 AM
Nathan Stanley
Nathan Stanley - avatar
0
To achieve radix point removal solely within the regex system using a named group construct, you can modify the regex pattern to include a named group for the integer part and another named group for the decimal part. Then, you can concatenate these two groups in the replacement string without including the comma character. Here's how you can do it: Regex Pattern: (?P<dollars>\d{1,3}(?:,\d{3})*)\.(?P<cents>\d{2}) Explanation: (?P<dollars>\d{1,3}(?:,\d{3})*): Named group for the dollars part of the currency, matching one to three digits followed by zero or more groups of a comma and three digits. \.: Matches the decimal point. (?P<cents>\d{2}): Named group for the cents part of the currency, matching exactly two digits. Python code to perform the substitution: https://www.sololearn.com/en/compiler-playground/cIekGfJOXDap/?ref=app This code uses re.sub() with a lambda function to concatenate the dollars and cents groups without including the comma character. This achieves radix point removal.
12th Apr 2024, 1:53 AM
Abiye Gebresilassie Enzo Emmanuel
Abiye Gebresilassie Enzo Emmanuel - avatar
0
You can achieve the desired result by modifying your regex pattern to include a non-capturing group for the comma. Here's the updated regex pattern: \$(?P<ItemCost>\d{1,3}(?:,\d{3})*(?:\.\d{2})) This pattern uses a non-capturing group " (?:,\d{3})* " that matches zero or more occurrences of a comma followed by three digits. This allows the regex to handle currency strings with an unlimited number of comma-separated groups.
12th Apr 2024, 7:53 AM
`ᴴᵗᵗየ
`ᴴᵗᵗየ - avatar
0
Hi http406, The regex doesn't remove radix points from the currency string. See: https://regex101.com/r/skIUAr/1 Non-capturing groups are not excluded from named group results.
14th Apr 2024, 11:38 AM
Nathan Stanley
Nathan Stanley - avatar
0
I am kind of sure you can't achieve what I wanted, which is a little surprising. If you use named groups non-capture groups are always returned as part of the named group.
25th Apr 2024, 10:31 PM
Nathan Stanley
Nathan Stanley - avatar