+ 1
What is unicode
4 Respuestas
+ 2
Computers only know numbers, humans only know text. So we need a way to translate text into numbers and numbers into text. We do that with an encoding/charset.
Historically in the USA and in western/central europe we used ASCII; the big letter "L" in ASCII gets the number 76 for example. The problem with ASCII is that it contains only latin letters, so the rest of the world couldn't use it and they developed their own charsets, for example Shift_JIS in Japan.
This is of course a compatibility nightmare. If I send an ASCII encoded document to a japanese friend, his text editor will maybe think it is Shift_JIS and to him your document will look like garbage.
Unicode is an encoding* that wants to include letters and symbols from all languages ever used (even languages no longer spoken!), so anyone in the world can use Unicode and any other person can understand it. It's the standard encoding these days and everyone uses it. (Though ASCII is still important)
____
* It's actually three encodings.
+ 2
UTF-8, UTF-16, and UTF-32.
ASCII (and ISO 8859-1 which came later) just gave each character a number. ISO 8859-1 characters are 1 byte wide, and 1 byte is 8 bits, so it could only hold 2^8 = 256 different characters.
Unicode obviously knows millions and so 1 byte isn't enough, that's why in Unicode a single character can be 4 bytes total. German umlauts (äöü) are 2 bytes in Unicode for example. And "👱🏽" is "blonde haired person" followed by "medium skin tone modifier", for 4 bytes total.
The only difference between UTF-8, -16, and -32 are that UTF-8 is 1 byte wide, UTF-16 is 2 bytes wide and UTF-32 4 bytes.
The first 128 numbers/chars in Unicode are the same as in ASCII and ISO 8859-1 so we use UTF-8 on the web because most characters in english/european text will be 1 byte wide and we will not be wasting space and bandwidth (UTF-32 text would be 4 times as big and mostly zeroes).
UTF-16 is common for programming languages to use internally. (comment too long so I'll stop...)
0
what are those three encodings