0
How can I get rid of apostrophes in a txt file?
I have a txt file and I'm trying to get a list which contain all the words in it. But I need to get rid of apostrophes, dots, commas etc. I used strip function for commas, dots and indentations. And I used replace for apostrophes. But it didn't work. I don't understand why? Because it normally works on strings. Here is my code. I can also provide txt file if you guys need it. https://code.sololearn.com/cN8sPyyJUU5z/?ref=app
11 Respostas
+ 6
You can try this, and you can easy adapt to more character:
https://code.sololearn.com/c4Z8hOVAMO6j/?ref=app
+ 5
It would be very helpful if you could just copy some lines from your input text BEFOR it is treated. And may be you can also give us a sample how it should look like.
+ 2
If i was given this task...I'd be using regular expressions i.e. "re.findall"
Use pattern "[a-zA-Z0-9]+"
Much much easier than they way your trying to do in your code.
edit.....?? you code does actually work anyway...what problem are you getting?
+ 2
Regex is the first choice. If not, then you could make a function that takes the strings to be removed as parameters and use .replace() for each parameter. Something like this
https://code.sololearn.com/c0h95U1GcMWy/?ref=app
+ 2
Lothar this is a part of my text "Dostoyevski Rusyaâda yaĆanan siyasi ve ekonomik olaylar sonrasında gözlemlediÄi hayatlardan esinlenerek 1866 yılında yazdıÄı eser ilk olarak Rus Habercisi isimli edebiyat dergisinde yayınlanmıĆtır. BĂŒyĂŒk beÄeni toplayan eser daha sonra kitap haline getirilmiĆ ve o gĂŒnden beri birçok kitap ve sinema filmine konu olmuĆtur. Suç ve Ceza Dostoyevskiânin baĆyapıtı sayılır."
I'm trying to get a list like this:
["Dostoyevski", "Rusya", "da", "yaĆanan"............]
+ 2
import re
class File:
def __init__(self):
with open("file.txt", "r", encoding="utf-8") as file:
self.naked_words = re.findall(r"\w+", file.read()) # < edited to \w+
myfile = File()
print(myfile.naked_words)
+ 2
Lothar Awesome. It literally worked. Thanks a lot.đ
+ 2
Baran Aldemir my code worked. It was not working in your case, because you never told it to replace â . Just pass whatever you want to replace as arguements in the function (strip_params) call (in line 10).
+ 2
Haha I'm so dumb. I've just realized my main problem was the difference between (') (â) symbols. On keyboard I guess we don't have (â) this? XXX That's why I couldn't make your code work.
+ 1
Thank you guys for your responses. rodwynnejones I'll check regex subject. I don't know that subject yet. XXX I implement your function to my file but somehow it doesn't work. Apostrophes are still remaining.
+ 1
rodwynnejones It worked at some level. But there are some other consequences đ
probably because of turkish characters. I should probably read about the regex first. I appreciate all the efforts. Thanks again.