0

How can I get rid of apostrophes in a txt file?

I have a txt file and I'm trying to get a list which contain all the words in it. But I need to get rid of apostrophes, dots, commas etc. I used strip function for commas, dots and indentations. And I used replace for apostrophes. But it didn't work. I don't understand why? Because it normally works on strings. Here is my code. I can also provide txt file if you guys need it. https://code.sololearn.com/cN8sPyyJUU5z/?ref=app

14th Mar 2020, 8:00 PM
Baran Aldemir
Baran Aldemir - avatar
11 Respostas
+ 6
You can try this, and you can easy adapt to more character: https://code.sololearn.com/c4Z8hOVAMO6j/?ref=app
14th Mar 2020, 9:59 PM
Lothar
Lothar - avatar
+ 5
It would be very helpful if you could just copy some lines from your input text BEFOR it is treated. And may be you can also give us a sample how it should look like.
14th Mar 2020, 9:23 PM
Lothar
Lothar - avatar
+ 2
If i was given this task...I'd be using regular expressions i.e. "re.findall" Use pattern "[a-zA-Z0-9]+" Much much easier than they way your trying to do in your code. edit.....?? you code does actually work anyway...what problem are you getting?
14th Mar 2020, 8:35 PM
rodwynnejones
rodwynnejones - avatar
+ 2
Regex is the first choice. If not, then you could make a function that takes the strings to be removed as parameters and use .replace() for each parameter. Something like this https://code.sololearn.com/c0h95U1GcMWy/?ref=app
14th Mar 2020, 8:57 PM
XXX
XXX - avatar
+ 2
Lothar this is a part of my text "Dostoyevski Rusya’da yaƟanan siyasi ve ekonomik olaylar sonrasında gözlemlediği hayatlardan esinlenerek 1866 yılında yazdığı eser ilk olarak Rus Habercisi isimli edebiyat dergisinde yayınlanmÄ±ĆŸtır. BĂŒyĂŒk beğeni toplayan eser daha sonra kitap haline getirilmiƟ ve o gĂŒnden beri birçok kitap ve sinema filmine konu olmuƟtur. Suç ve Ceza Dostoyevski’nin baƟyapıtı sayılır." I'm trying to get a list like this: ["Dostoyevski", "Rusya", "da", "yaƟanan"............]
14th Mar 2020, 9:29 PM
Baran Aldemir
Baran Aldemir - avatar
+ 2
import re class File: def __init__(self): with open("file.txt", "r", encoding="utf-8") as file: self.naked_words = re.findall(r"\w+", file.read()) # < edited to \w+ myfile = File() print(myfile.naked_words)
14th Mar 2020, 9:31 PM
rodwynnejones
rodwynnejones - avatar
+ 2
Lothar Awesome. It literally worked. Thanks a lot.👍
14th Mar 2020, 10:24 PM
Baran Aldemir
Baran Aldemir - avatar
+ 2
Baran Aldemir my code worked. It was not working in your case, because you never told it to replace ’ . Just pass whatever you want to replace as arguements in the function (strip_params) call (in line 10).
15th Mar 2020, 4:53 AM
XXX
XXX - avatar
+ 2
Haha I'm so dumb. I've just realized my main problem was the difference between (') (’) symbols. On keyboard I guess we don't have (’) this? XXX That's why I couldn't make your code work.
15th Mar 2020, 2:16 PM
Baran Aldemir
Baran Aldemir - avatar
+ 1
Thank you guys for your responses. rodwynnejones I'll check regex subject. I don't know that subject yet. XXX I implement your function to my file but somehow it doesn't work. Apostrophes are still remaining.
14th Mar 2020, 9:24 PM
Baran Aldemir
Baran Aldemir - avatar
+ 1
rodwynnejones It worked at some level. But there are some other consequences 😅 probably because of turkish characters. I should probably read about the regex first. I appreciate all the efforts. Thanks again.
14th Mar 2020, 9:41 PM
Baran Aldemir
Baran Aldemir - avatar