0

How can I get rid of apostrophes in a txt file?

I have a txt file and I'm trying to get a list which contain all the words in it. But I need to get rid of apostrophes, dots, commas etc. I used strip function for commas, dots and indentations. And I used replace for apostrophes. But it didn't work. I don't understand why? Because it normally works on strings. Here is my code. I can also provide txt file if you guys need it. https://code.sololearn.com/cN8sPyyJUU5z/?ref=app

14th Mar 2020, 8:00 PM
Baran Aldemir
Baran Aldemir - avatar
11 ответов
+ 6
You can try this, and you can easy adapt to more character: https://code.sololearn.com/c4Z8hOVAMO6j/?ref=app
14th Mar 2020, 9:59 PM
Lothar
Lothar - avatar
+ 5
It would be very helpful if you could just copy some lines from your input text BEFOR it is treated. And may be you can also give us a sample how it should look like.
14th Mar 2020, 9:23 PM
Lothar
Lothar - avatar
+ 2
If i was given this task...I'd be using regular expressions i.e. "re.findall" Use pattern "[a-zA-Z0-9]+" Much much easier than they way your trying to do in your code. edit.....?? you code does actually work anyway...what problem are you getting?
14th Mar 2020, 8:35 PM
rodwynnejones
rodwynnejones - avatar
+ 2
Regex is the first choice. If not, then you could make a function that takes the strings to be removed as parameters and use .replace() for each parameter. Something like this https://code.sololearn.com/c0h95U1GcMWy/?ref=app
14th Mar 2020, 8:57 PM
XXX
XXX - avatar
+ 2
Lothar this is a part of my text "Dostoyevski Rusya’da yaşanan siyasi ve ekonomik olaylar sonrasında gözlemlediği hayatlardan esinlenerek 1866 yılında yazdığı eser ilk olarak Rus Habercisi isimli edebiyat dergisinde yayınlanmıştır. Büyük beğeni toplayan eser daha sonra kitap haline getirilmiş ve o günden beri birçok kitap ve sinema filmine konu olmuştur. Suç ve Ceza Dostoyevski’nin başyapıtı sayılır." I'm trying to get a list like this: ["Dostoyevski", "Rusya", "da", "yaşanan"............]
14th Mar 2020, 9:29 PM
Baran Aldemir
Baran Aldemir - avatar
+ 2
import re class File: def __init__(self): with open("file.txt", "r", encoding="utf-8") as file: self.naked_words = re.findall(r"\w+", file.read()) # < edited to \w+ myfile = File() print(myfile.naked_words)
14th Mar 2020, 9:31 PM
rodwynnejones
rodwynnejones - avatar
+ 2
Lothar Awesome. It literally worked. Thanks a lot.👍
14th Mar 2020, 10:24 PM
Baran Aldemir
Baran Aldemir - avatar
+ 2
Baran Aldemir my code worked. It was not working in your case, because you never told it to replace ’ . Just pass whatever you want to replace as arguements in the function (strip_params) call (in line 10).
15th Mar 2020, 4:53 AM
XXX
XXX - avatar
+ 2
Haha I'm so dumb. I've just realized my main problem was the difference between (') (’) symbols. On keyboard I guess we don't have (’) this? XXX That's why I couldn't make your code work.
15th Mar 2020, 2:16 PM
Baran Aldemir
Baran Aldemir - avatar
+ 1
Thank you guys for your responses. rodwynnejones I'll check regex subject. I don't know that subject yet. XXX I implement your function to my file but somehow it doesn't work. Apostrophes are still remaining.
14th Mar 2020, 9:24 PM
Baran Aldemir
Baran Aldemir - avatar
+ 1
rodwynnejones It worked at some level. But there are some other consequences 😅 probably because of turkish characters. I should probably read about the regex first. I appreciate all the efforts. Thanks again.
14th Mar 2020, 9:41 PM
Baran Aldemir
Baran Aldemir - avatar