+ 8
Python, RegEx, how to eliminate empty strings in regex call? see description
I have an example code attached here: https://code.sololearn.com/cu6QxKQm6gJT/?ref=app I want to have a resulting list of strings where there are no empty strings. I have done this by list comprehension in line 5 of the code. How would I have to modify the regex in line 3 to achieve the same result but without the list comprehension? The code should be so versatile that the string t can begin and end with a group of letters or a group of digits.
9 Réponses
+ 2
Jan Markus I don't know what your constraints are so, in case you can use sub(), I found another possibility. Insert a space between each group of letter and numbers, then split them.
import re
t = 'abc345def678ghi910'
r = re.sub(r'(?<=\d)(?=\D)|(?<=\D)(?=\d)', r' ', t).split()
+ 7
Jan Markus
This is a very interesting question 😃
I have been on this for a while now. Tried groups, groupdict and stuff and the ways I found are either you remove the empty string manually:
manual = [x for x in re.split(r"(\d+)",t) if x]
print(manual)
or...use findall
way = re.findall(r"\D+|\d+", t)
print(way)
Awaiting a better answer though😃😃
+ 6
Jan Markus The problem with using re.split() isn't in the regex, but the purpose of the split() method, which is to separate values based on matching delimiters. In this case, the delimiter is a regex pattern.
Consider attempting to split on a single character, like "," applied to the string value: ",a,b,". The split() method focuses on taking the left and right values of each delimiter. If no value exists, the method returns an empty string.
For this reason, I don't consider re.split() to be a traditional regex method. It's something added by the language or library as a convenience method.
The more "pure regex" approach would be to match on all non numeric and numeric digits as demonstrated by Tomiwa Joseph.
In the meantime, as you've mentioned being new to regex, just know that Python adds its own "Pythonic" or opinionated regex methods that aren't consistent with the behaviors of other languages.
For that reason, don't fall into the trap of using Python's regex as standard of measure.
+ 4
Jan Markus The groups that start (?=...) and (?<=...) are called lookarounds. There is a fairly detailed explanation of them here: https://www.rexegg.com/regex-lookarounds.html with tips on how to use them. As I say, it is quite in-depth so it may be a touch deep for a beginner but give it a go. Happy learning!
+ 3
You could add a character you know you won't have in the string to the end, then remove the last item from the list. Hardly a decent solution though.
r = re.split(r'(\d+)', t+',')[:-1]
Would recommend using findall() if you can as Tomiwa Joseph said.
+ 2
Russ
Thank you for your code. It shows exactly the behaviour what I have wanted to gain.
I will now try to understand how it works. Some parts seem a little strange to me.
Is there a special literature where one could learn these magic secrets up to the bone?
+ 1
David
that does only work if I can be sure that the string ends with digits, but I cannot be sure.
+ 1
Tomiwa Joseph
Your way is acceptable and I will do it this way as long as I am a noob in the regex field.
Though I did hope to be able to solve it by pure regex, but may be later.
Thank you all of you guys for your efforts.
https://code.sololearn.com/ckUv3jsVCZ60/?ref=app
0
Jan Markus
In line 4
print(r[:-1])