+ 31

Utf

What is difference between utf-8 and utf-16

27th Sep 2020, 12:19 PM
JAY • ≫
JAY • ≫ - avatar
17 ответов
+ 40
Part 1 Both UTF-8 and UTF-16 are variable length encodings. However, in UTF-8 a character may occupy a minimum of 8 bits, while in UTF-16 character length starts with 16 bits. Main UTF-8 pros: Basic ASCII characters like digits, Latin characters with no accents, etc. occupy one byte which is identical to US-ASCII representation. This way all US-ASCII strings become valid UTF-8, which provides decent backwards compatibility in many cases. No null bytes, which allows to use null-terminated strings, this introduces a great deal of backwards compatibility too. UTF-8 is independent of byte order, so you don't have to worry about Big Endian / Little Endian issue. Main UTF-8 cons: Many common characters have different length, which slows indexing by codepoint and calculating a codepoint count terribly. Even though byte order doesn't matter, sometimes UTF-8 still has BOM (byte order mark) which serves to notify that the text is encoded in UTF-8.
27th Sep 2020, 12:34 PM
Raj Srivastava
Raj Srivastava - avatar
+ 25
Part 2 Main UTF-16 pros: BMP (basic multilingual plane) characters, including Latin, Cyrillic, most Chinese (the PRC made support for some codepoints outside BMP mandatory), most Japanese can be represented with 2 bytes. This speeds up indexing and calculating codepoint count in case the text does not contain supplementary characters. Even if the text has supplementary characters, they are still represented by pairs of 16-bit values, which means that the total length is still divisible by two and allows to use 16-bit char as the primitive component of the string. Main UTF-16 cons: Lots of null bytes in US-ASCII strings, which means no null-terminated strings and a lot of wasted memory. Using it as a fixed-length encoding “mostly works” in many common scenarios (especially in US / EU / countries with Cyrillic alphabets / Israel / Arab countries / Iran and many others), often leading to broken support where it doesn't. 
27th Sep 2020, 12:35 PM
Raj Srivastava
Raj Srivastava - avatar
+ 19
Last words: In general, UTF-16 is usually better for in-memory representation because BE/LE is irrelevant there (just use native order) and indexing is faster (just don't forget to handle surrogate pairs properly). UTF-8, on the other hand, is extremely good for text files and network protocols because there is no BE/LE issue and null-termination often comes in handy, as well as ASCII-compatibility. Not totally my words. Happy Coding </>
27th Sep 2020, 12:35 PM
Raj Srivastava
Raj Srivastava - avatar
+ 9
It is encoding of file which is the ability to show i.e(from input -> binary > to Screen)different languages or programming language in you laptop for more see here https://en.m.wikipedia.org/wiki/Comparison_of_Unicode_encodings
27th Sep 2020, 12:27 PM
Ananiya Jemberu
Ananiya Jemberu - avatar
+ 6
UTF-8 = 8 bits variable Length UTF-16 = 16 bits variable Length May it'll be Helpful & easy to understand 😃😃👼
27th Sep 2020, 2:18 PM
Rishbabh Sharma
Rishbabh Sharma - avatar
+ 5
UTF-8 is identical to ASCII for the values from 0 to 127. UTF-8 does not use the values from 128 to 159. UTF-8 is identical to both ANSI and 8859-1 for the values from 160 to 255. UTF-8 continues from the value 256 with more than 10 000 different characters.
28th Sep 2020, 4:14 PM
Michael Victor
Michael Victor - avatar
- 1
Wow Raj 丂尺ノ√ム丂イム√ム you are the best 🙌
29th Sep 2020, 8:54 AM
Ore
Ore - avatar
- 1
Hi
1st Oct 2020, 12:28 PM
Astitva Gupta
- 2
Hi
29th Sep 2020, 10:05 AM
Asi Reddy Charan
Asi Reddy Charan - avatar
- 5
How to make app
29th Sep 2020, 8:58 AM
Surya
- 6
Way not being
29th Sep 2020, 4:38 AM
Байэл Аскаров
- 8
Nicee
28th Sep 2020, 1:45 PM
Jet Zani
Jet Zani - avatar
- 8
Hello
28th Sep 2020, 2:29 PM
Astitva Gupta
- 8
Hello
29th Sep 2020, 1:02 AM
Guedjali Chouaib
Guedjali Chouaib - avatar
- 8
Please give me hello word to the answer now..
29th Sep 2020, 1:55 AM
Isuru Dilshan
Isuru Dilshan - avatar
- 9
I don’t no
28th Sep 2020, 8:05 PM
Rezaul Karim
Rezaul Karim - avatar