+ 2

How decode text (cp866) to text (utf8) using Python?

I have many files encoded MS-DOS (cp866). It's russian characters. I want to decode them all using python scripts. My code for one file: with open("some_file.txt", "r", encoding="cp866") as myfile: data = myfile.read() data_encoding = data.encode("utf-8") print(data_encoding) Out: \xe2\x95\xac\xe2... etc I need: "Аовраоаладад..."

6th Mar 2020, 6:20 AM
Valerii Mamontov
Valerii Mamontov - avatar
7 Réponses
+ 2
I found a solution. I'm decoded text to "cp1251" charset and got a readable text. code: with open("some_file.txt", mode="r", encoding="KOI8-R") as myfile: data = myfile.read() b = bytes(data,"KOI8-R") data_encoding = str(b,"cp1251") print(data_encoding)
6th Mar 2020, 1:21 PM
Valerii Mamontov
Valerii Mamontov - avatar
+ 3
I can not test it, because i have no text in cp866. But you can try it with this: result = text.encode('cp866').decode('cp866').encode('utf8')
6th Mar 2020, 10:59 AM
Lothar
Lothar - avatar
+ 3
Here is a link to a website where you can online input text with definition about encoding, and you can define the output encoding also: https://2cyr.com/decode/
6th Mar 2020, 12:00 PM
Lothar
Lothar - avatar
+ 3
Valerii, very good!
6th Mar 2020, 3:06 PM
Lothar
Lothar - avatar
+ 1
Lothar thanks, but your code doesn't decide task
6th Mar 2020, 11:30 AM
Valerii Mamontov
Valerii Mamontov - avatar
+ 1
Can you explane whathappens? What is the output? Did you output on condole, or did you also output to a file?
6th Mar 2020, 11:55 AM
Lothar
Lothar - avatar
+ 1
Lothar I output "data" in console. Because save to file uncorrectly data is foolish.
6th Mar 2020, 12:00 PM
Valerii Mamontov
Valerii Mamontov - avatar