+ 2
How decode text (cp866) to text (utf8) using Python?
I have many files encoded MS-DOS (cp866). It's russian characters. I want to decode them all using python scripts. My code for one file: with open("some_file.txt", "r", encoding="cp866") as myfile: data = myfile.read() data_encoding = data.encode("utf-8") print(data_encoding) Out: \xe2\x95\xac\xe2... etc I need: "Аовраоаладад..."
7 odpowiedzi
+ 2
I found a solution. I'm decoded text to "cp1251" charset and got a readable text.
code:
with open("some_file.txt", mode="r", encoding="KOI8-R") as myfile:
data = myfile.read()
b = bytes(data,"KOI8-R")
data_encoding = str(b,"cp1251")
print(data_encoding)
+ 3
I can not test it, because i have no text in cp866. But you can try it with this:
result = text.encode('cp866').decode('cp866').encode('utf8')
+ 3
Here is a link to a website where you can online input text with definition about encoding, and you can define the output encoding also:
https://2cyr.com/decode/
+ 3
Valerii, very good!
+ 1
Lothar thanks, but your code doesn't decide task
+ 1
Can you explane whathappens? What is the output? Did you output on condole, or did you also output to a file?
+ 1
Lothar I output "data" in console.
Because save to file uncorrectly data is foolish.