+ 1

Regex(regular expression) help needed! as fast as possible!

statements = ['\n\t \t\tą¤øą¤¤ą„ą¤¤ą¤¾ą¤°ą„ą¤¢ ą¤¦ą¤²ą¤•ą„‹ ą¤µą¤æą¤µą¤¾ą¤¦ ą¤Ŗą¤Øą¤æ ą¤²ą¤•ą¤”ą¤¾ą¤‰ą¤Øą¤®ą¤¾\n\n\t \t\t\t \t', '\n\t \t\tą¤•ą¤² ą¤Ÿą„ą¤Æą¤¾ą¤Ŗą¤æą¤™ ą¤—ą¤°ą„ą¤Ø ą¤Ŗą¤¾ą¤‰ą¤Øą„‡ ą¤µą¤æą¤§ą„‡ą¤Æą¤• ą¤Ŗą¤¾ą¤°ą¤æą¤¤\n\n\t \t\t\t \t', '\n\t \t\tą¤•ą„‹ą¤°ą„‹ą¤Øą¤¾ ą¤…ą¤øą¤° : ą¤øą¤¬ą„ˆą¤­ą¤Øą„ą¤¦ą¤¾ ą¤¬ą¤¢ą„€ ą¤Æą„ą¤ą¤ˆą¤®ą¤¾ ą¤Øą„‡ą¤Ŗą¤¾ą¤²ą„€ą¤²ą„‡ ą¤°ą„‹ą¤œą¤—ą¤¾ą¤°ą„€ ą¤—ą„ą¤®ą¤¾ą¤‰ą¤ą¤¦ą„ˆ\n\n\t \t\t\t \t', '\n\t \t\tą¤²ą¤æą¤Ŗą„ą¤²ą„‡ą¤• ą¤œą¤¾ą¤Ø ą¤¹ą¤æą¤ą¤”ą„‡ą¤•ą¤¾ ą¤†ą¤  ą¤œą¤Øą¤¾ ą¤Ŗą¤•ą„ą¤°ą¤¾ą¤‰\n\n\t \t\t\t \t'] so I have a list of data that is written in Nepali Language. All I want to is get rid of the other things in these statements and only retrieve the main data written in Nepali. I want to use regex(regular expression) and I want it in similar format like: pattern = re.compile(r'______') #pattern for the statements for each in statements: matches = re.finditer(pattern, each) for match in matches: print(match.group(1)) #where group 1 or any other number means the main data or you can make your own something else...help needed!

21st May 2020, 5:14 PM
Gaurav Giri
Gaurav Giri - avatar
3 odpowiedzi
+ 2
statements = ['\n\t \t\tą¤øą¤¤ą„ą¤¤ą¤¾ą¤°ą„ą¤¢ ą¤¦ą¤²ą¤•ą„‹ ą¤µą¤æą¤µą¤¾ą¤¦ ą¤Ŗą¤Øą¤æ ą¤²ą¤•ą¤”ą¤¾ą¤‰ą¤Øą¤®ą¤¾\n\n\t \t\t\t \t', '\n\t \t\tą¤•ą¤² ą¤Ÿą„ą¤Æą¤¾ą¤Ŗą¤æą¤™ ą¤—ą¤°ą„ą¤Ø ą¤Ŗą¤¾ą¤‰ą¤Øą„‡ ą¤µą¤æą¤§ą„‡ą¤Æą¤• ą¤Ŗą¤¾ą¤°ą¤æą¤¤\n\n\t \t\t\t \t', '\n\t \t\tą¤•ą„‹ą¤°ą„‹ą¤Øą¤¾ ą¤…ą¤øą¤° : ą¤øą¤¬ą„ˆą¤­ą¤Øą„ą¤¦ą¤¾ ą¤¬ą¤¢ą„€ ą¤Æą„ą¤ą¤ˆą¤®ą¤¾ ą¤Øą„‡ą¤Ŗą¤¾ą¤²ą„€ą¤²ą„‡ ą¤°ą„‹ą¤œą¤—ą¤¾ą¤°ą„€ ą¤—ą„ą¤®ą¤¾ą¤‰ą¤ą¤¦ą„ˆ\n\n\t \t\t\t \t', '\n\t \t\tą¤²ą¤æą¤Ŗą„ą¤²ą„‡ą¤• ą¤œą¤¾ą¤Ø ą¤¹ą¤æą¤ą¤”ą„‡ą¤•ą¤¾ ą¤†ą¤  ą¤œą¤Øą¤¾ ą¤Ŗą¤•ą„ą¤°ą¤¾ą¤‰\n\n\t \t\t\t \t'] import re new_list = [] pattern = r'(?:\n|\t| {2,})' for each in statements: matches = re.sub(pattern, r'', each) new_list.append(matches) print(new_list) #['ą¤øą¤¤ą„ą¤¤ą¤¾ą¤°ą„ą¤¢ ą¤¦ą¤²ą¤•ą„‹ ą¤µą¤æą¤µą¤¾ą¤¦ ą¤Ŗą¤Øą¤æ ą¤²ą¤•ą¤”ą¤¾ą¤‰ą¤Øą¤®ą¤¾', 'ą¤•ą¤² ą¤Ÿą„ą¤Æą¤¾ą¤Ŗą¤æą¤™ ą¤—ą¤°ą„ą¤Ø ą¤Ŗą¤¾ą¤‰ą¤Øą„‡ ą¤µą¤æą¤§ą„‡ą¤Æą¤• ą¤Ŗą¤¾ą¤°ą¤æą¤¤', 'ą¤•ą„‹ą¤°ą„‹ą¤Øą¤¾ ą¤…ą¤øą¤° : ą¤øą¤¬ą„ˆą¤­ą¤Øą„ą¤¦ą¤¾ ą¤¬ą¤¢ą„€ ą¤Æą„ą¤ą¤ˆą¤®ą¤¾ ą¤Øą„‡ą¤Ŗą¤¾ą¤²ą„€ą¤²ą„‡ ą¤°ą„‹ą¤œą¤—ą¤¾ą¤°ą„€ ą¤—ą„ą¤®ą¤¾ą¤‰ą¤ą¤¦ą„ˆ', 'ą¤²ą¤æą¤Ŗą„ą¤²ą„‡ą¤• ą¤œą¤¾ą¤Ø ą¤¹ą¤æą¤ą¤”ą„‡ą¤•ą¤¾ ą¤†ą¤  ą¤œą¤Øą¤¾ ą¤Ŗą¤•ą„ą¤°ą¤¾ą¤‰']
21st May 2020, 6:54 PM
Russ
Russ - avatar
0
No matter what language all i want is that main data written in sanskrit or anything else..actually its nepali language
21st May 2020, 5:29 PM
Gaurav Giri
Gaurav Giri - avatar