0
How can I get rid of apostrophes in a txt file?
I have a txt file and I'm trying to get a list which contain all the words in it. But I need to get rid of apostrophes, dots, commas etc. I used strip function for commas, dots and indentations. And I used replace for apostrophes. But it didn't work. I don't understand why? Because it normally works on strings. Here is my code. I can also provide txt file if you guys need it. https://code.sololearn.com/cN8sPyyJUU5z/?ref=app
11 Réponses
+ 6
You can try this, and you can easy adapt to more character:
https://code.sololearn.com/c4Z8hOVAMO6j/?ref=app
+ 5
It would be very helpful if you could just copy some lines from your input text BEFOR it is treated. And may be you can also give us a sample how it should look like.
+ 2
If i was given this task...I'd be using regular expressions i.e. "re.findall"
Use pattern "[a-zA-Z0-9]+"
Much much easier than they way your trying to do in your code.
edit.....?? you code does actually work anyway...what problem are you getting?
+ 2
Regex is the first choice. If not, then you could make a function that takes the strings to be removed as parameters and use .replace() for each parameter. Something like this
https://code.sololearn.com/c0h95U1GcMWy/?ref=app
+ 2
Lothar this is a part of my text "Dostoyevski Rusya’da yaşanan siyasi ve ekonomik olaylar sonrasında gözlemlediği hayatlardan esinlenerek 1866 yılında yazdığı eser ilk olarak Rus Habercisi isimli edebiyat dergisinde yayınlanmıştır. Büyük beğeni toplayan eser daha sonra kitap haline getirilmiş ve o günden beri birçok kitap ve sinema filmine konu olmuştur. Suç ve Ceza Dostoyevski’nin başyapıtı sayılır."
I'm trying to get a list like this:
["Dostoyevski", "Rusya", "da", "yaşanan"............]
+ 2
import re
class File:
def __init__(self):
with open("file.txt", "r", encoding="utf-8") as file:
self.naked_words = re.findall(r"\w+", file.read()) # < edited to \w+
myfile = File()
print(myfile.naked_words)
+ 2
Lothar Awesome. It literally worked. Thanks a lot.👍
+ 2
Baran Aldemir my code worked. It was not working in your case, because you never told it to replace ’ . Just pass whatever you want to replace as arguements in the function (strip_params) call (in line 10).
+ 2
Haha I'm so dumb. I've just realized my main problem was the difference between (') (’) symbols. On keyboard I guess we don't have (’) this? XXX That's why I couldn't make your code work.
+ 1
Thank you guys for your responses. rodwynnejones I'll check regex subject. I don't know that subject yet. XXX I implement your function to my file but somehow it doesn't work. Apostrophes are still remaining.
+ 1
rodwynnejones It worked at some level. But there are some other consequences 😅 probably because of turkish characters. I should probably read about the regex first. I appreciate all the efforts. Thanks again.