+ 1
PHP filter problem
When I filter the string "ß®q¸NiÞ" using FILTER_SANITIZE_STRING filter and FILTER_FLAG_STRIP_LOW | FILTER_FLAG_ENCODE_HIGH options, save it in a utf8_mb4 database and retrieve it I get "ß®q¸NiÞ". Why is that? My default charset is UTF-8 and my HTML charset is also UTF-8.
14 Answers
+ 3
i did a quick google search
https://www.htmlsymbols.xyz/unicode/U+00DE
the utf-8 seems to be splitted in half
i dont have much flight time with php, i do remember an mb- function to change encoding. but it doesnt seems to ship with every php binary out there
+ 2
in the database, does it still "ß®q¸NiÞ" ? or already "ß®q¸NiÞ" ?
+ 2
the fact that some of the character are split into half when represented in utf-8, doesnt that means its not in utf-8 ? maybe change the encoding to utf-16
not sure if php sololearn has mb modules to do that
+ 2
yep, that one
+ 2
Rei Wow. It works after converting to UTF-16. You are my champ! Thanks for the assist.
+ 2
glad i could help :D
+ 2
i can relate 😅
+ 1
Rei It is saved as ß®q¸NiÞ
Which is ß®q¸NiÞ
+ 1
It is not a database problem. It is an issue with filter_var, I guess. If I do not use any filter options it works as expected.
https://code.sololearn.com/wehJ066Zm5Yj/?ref=app
+ 1
yeah, at first i though its just an encoding problem. turns out its what sanitation does..
+ 1
Rei After some research I think the culprit is FILTER_FLAG_ENCODE_HIGH. It seems to be a known problem but I can't find a solution that works.
https://stackoverflow.com/questions/14739110/php-filter-var-filter-flag-encode-high
https://stackoverflow.com/questions/38363566/trouble-with-utf-8-characters-what-i-see-is-not-what-i-stored (I am using Doctrine 2 also)
+ 1
Rei I don't understand. Do you mean that those characters are too high to be encoded as UTF-8?
'ß', a valid german character has a decimal value of ß which is quite high. Is that why it is split has Ã, Ÿ, Â?
If so is the encoding process reversible. Is there a way to decode it?
+ 1
Rei
mb_convert_encoding? I will give it a try.
+ 1
Working with unicode is frustrating when using PHP. 🤒