+ 1
What is utf-8 , Utf-16 , encoding ?
What is used in html or any lang.
2 Answers
+ 8
âȘIntroduction to Unicode and UTF-8
https://flaviocopes.com/unicode/
+ 1
They're simply different schemes for representing Unicode characters.
Both are variable-length - UTF-16 uses 2 bytes for all characters in the basic multilingual plane (BMP) which contains most characters in common use.
UTF-8 uses between 1 and 3 bytes for characters in the BMP, up to 4 for characters in the current Unicode range of U+0000 to U+1FFFFF, and is extensible up to U+7FFFFFFF if that ever becomes necessary... but notably all ASCII characters are represented in a single byte each.
For the purposes of a message digest it won't matter which of these you pick, so long as everyone who tries to recreate the digest uses the same option.
See this page for more about UTF-8 and Unicode.
(Note that all Java characters are UTF-16 code points within the BMP; to represent characters above U+FFFF you need to use surrogate pairs in Java.)