+ 31
Utf
What is difference between utf-8 and utf-16
16 Answers
+ 40
Part 1
Both UTF-8 and UTF-16 are variable length encodings. However, in UTF-8 a character may occupy a minimum of 8 bits, while in UTF-16 character length starts with 16 bits.
Main UTF-8 pros:
Basic ASCII characters like digits, Latin characters with no accents, etc. occupy one byte which is identical to US-ASCII representation. This way all US-ASCII strings become valid UTF-8, which provides decent backwards compatibility in many cases.
No null bytes, which allows to use null-terminated strings, this introduces a great deal of backwards compatibility too.
UTF-8 is independent of byte order, so you don't have to worry about Big Endian / Little Endian issue.
Main UTF-8 cons:
Many common characters have different length, which slows indexing by codepoint and calculating a codepoint count terribly.
Even though byte order doesn't matter, sometimes UTF-8 still has BOM (byte order mark) which serves to notify that the text is encoded in UTF-8.
+ 25
Part 2
Main UTF-16 pros:
BMP (basic multilingual plane) characters, including Latin, Cyrillic, most Chinese (the PRC made support for some codepoints outside BMP mandatory), most Japanese can be represented with 2 bytes. This speeds up indexing and calculating codepoint count in case the text does not contain supplementary characters.
Even if the text has supplementary characters, they are still represented by pairs of 16-bit values, which means that the total length is still divisible by two and allows to use 16-bit char as the primitive component of the string.
Main UTF-16 cons:
Lots of null bytes in US-ASCII strings, which means no null-terminated strings and a lot of wasted memory.
Using it as a fixed-length encoding “mostly works” in many common scenarios (especially in US / EU / countries with Cyrillic alphabets / Israel / Arab countries / Iran and many others), often leading to broken support where it doesn't.
+ 19
Last words:
In general, UTF-16 is usually better for in-memory representation because BE/LE is irrelevant there (just use native order) and indexing is faster (just don't forget to handle surrogate pairs properly). UTF-8, on the other hand, is extremely good for text files and network protocols because there is no BE/LE issue and null-termination often comes in handy, as well as ASCII-compatibility.
Not totally my words.
Happy Coding </>
+ 9
It is encoding of file which is the ability to show i.e(from input -> binary > to Screen)different languages or programming language in you laptop
for more see here https://en.m.wikipedia.org/wiki/Comparison_of_Unicode_encodings
+ 6
UTF-8 = 8 bits variable Length
UTF-16 = 16 bits variable Length
May it'll be Helpful & easy to understand 😃😃👼
+ 5
UTF-8 is identical to ASCII for the values from 0 to 127.
UTF-8 does not use the values from 128 to 159.
UTF-8 is identical to both ANSI and 8859-1 for the values from 160 to 255.
UTF-8 continues from the value 256 with more than 10 000 different characters.
- 1
Wow Raj 丂尺ノ√ム丂イム√ム you are the best 🙌
- 1
Hi
- 2
Hi
- 5
How to make app
- 6
Way not being
- 8
Nicee
- 8
Hello
- 8
Hello
- 8
Please give me hello word to the answer now..
- 9
I don’t no