+ 1

Regex(regular expression) help needed! as fast as possible!

statements = ['\n\t \t\tसत्तारुढ दलको विवाद पनि लकडाउनमा\n\n\t \t\t\t \t', '\n\t \t\tकल ट्यापिङ गर्न पाउने विधेयक पारित\n\n\t \t\t\t \t', '\n\t \t\tकोरोना असर : सबैभन्दा बढी युएईमा नेपालीले रोजगारी गुमाउँदै\n\n\t \t\t\t \t', '\n\t \t\tलिपुलेक जान हिँडेका आठ जना पक्राउ\n\n\t \t\t\t \t'] so I have a list of data that is written in Nepali Language. All I want to is get rid of the other things in these statements and only retrieve the main data written in Nepali. I want to use regex(regular expression) and I want it in similar format like: pattern = re.compile(r'______') #pattern for the statements for each in statements: matches = re.finditer(pattern, each) for match in matches: print(match.group(1)) #where group 1 or any other number means the main data or you can make your own something else...help needed!

21st May 2020, 5:14 PM
Gaurav Giri
Gaurav Giri - avatar
2 Réponses
+ 2
statements = ['\n\t \t\tसत्तारुढ दलको विवाद पनि लकडाउनमा\n\n\t \t\t\t \t', '\n\t \t\tकल ट्यापिङ गर्न पाउने विधेयक पारित\n\n\t \t\t\t \t', '\n\t \t\tकोरोना असर : सबैभन्दा बढी युएईमा नेपालीले रोजगारी गुमाउँदै\n\n\t \t\t\t \t', '\n\t \t\tलिपुलेक जान हिँडेका आठ जना पक्राउ\n\n\t \t\t\t \t'] import re new_list = [] pattern = r'(?:\n|\t| {2,})' for each in statements: matches = re.sub(pattern, r'', each) new_list.append(matches) print(new_list) #['सत्तारुढ दलको विवाद पनि लकडाउनमा', 'कल ट्यापिङ गर्न पाउने विधेयक पारित', 'कोरोना असर : सबैभन्दा बढी युएईमा नेपालीले रोजगारी गुमाउँदै', 'लिपुलेक जान हिँडेका आठ जना पक्राउ']
21st May 2020, 6:54 PM
Russ
Russ - avatar
0
No matter what language all i want is that main data written in sanskrit or anything else..actually its nepali language
21st May 2020, 5:29 PM
Gaurav Giri
Gaurav Giri - avatar