+ 1
Regular Expressions and Unicode For Dummies(aka Me)
Hey everyone, so im tackling the spy code coach, and I need to flip a string around (pretty easy), remove symbols (not so easy but I can pull a code line from a past project to do this), and remove numbers. Here is my code I frankensteined out of another project: sentence_nopunct= re.sub(r'[^\w\s]','',sentence, re.UNICODE) As 99% of anyone who reads this likely knows, this string removes non-alphanumeric characters from a string. My question is, can anyone give me a quick explanation on what is going on with this string, and how i might be able to alter it to filter numeric characters as well? (Side note: I forgot to change sentence_nopunct and sentence to something more relevant, so feel free to change those to whatever you see fit)
5 Answers
+ 5
Randy Ziegler Typically, the hard brackets [ ... ] specify any characters to match.
However, placing a caret [^ ... ] as the first character within the hard brackets will change this to match anything "except" those characters within the brackets.
a-z indicates all lowercase letters between "a" through "z".
A-Z will apply to all uppercase letters.
\w specifies all alphanumeric characters.
\s specifies all whitespace characters like spaces, tabs, and line breaks.
\d specifies numeric digits
You can mess around with these patterns in the code I just created for you to see how these work.
https://code.sololearn.com/c3EroQTAStU7/?ref=app
+ 2
+ 2
in addition to playing with code from David Carroll I suggest using https://regex101.com/ to test your expressions
+ 1
Well, this looks like I'm getting in a bit over my head as a beginner, so I'm just going to try something else lmao
+ 1
Thanks so much, I kind of moved on to other things but maybe later I'll crack down on this