+ 1

PHP filter problem

When I filter the string "ß®q¸NiÞ" using FILTER_SANITIZE_STRING filter and FILTER_FLAG_STRIP_LOW | FILTER_FLAG_ENCODE_HIGH options, save it in a utf8_mb4 database and retrieve it I get "ß®q¸NiÞ". Why is that? My default charset is UTF-8 and my HTML charset is also UTF-8.

5th Sep 2020, 8:53 AM
Ore
Ore - avatar
14 Answers
+ 3
i did a quick google search https://www.htmlsymbols.xyz/unicode/U+00DE the utf-8 seems to be splitted in half i dont have much flight time with php, i do remember an mb- function to change encoding. but it doesnt seems to ship with every php binary out there
5th Sep 2020, 11:03 AM
Rei
Rei - avatar
+ 2
in the database, does it still "ß®q¸NiÞ" ? or already "ß®q¸NiÞ" ?
5th Sep 2020, 9:12 AM
Rei
Rei - avatar
+ 2
the fact that some of the character are split into half when represented in utf-8, doesnt that means its not in utf-8 ? maybe change the encoding to utf-16 not sure if php sololearn has mb modules to do that
5th Sep 2020, 10:19 AM
Rei
Rei - avatar
+ 2
yep, that one
5th Sep 2020, 11:58 AM
Rei
Rei - avatar
+ 2
Rei Wow. It works after converting to UTF-16. You are my champ! Thanks for the assist.
5th Sep 2020, 12:10 PM
Ore
Ore - avatar
+ 2
glad i could help :D
5th Sep 2020, 12:12 PM
Rei
Rei - avatar
+ 2
i can relate 😅
5th Sep 2020, 12:20 PM
Rei
Rei - avatar
+ 1
Rei It is saved as ß®q¸NiÞ Which is ß®q¸NiÞ
5th Sep 2020, 9:18 AM
Ore
Ore - avatar
+ 1
It is not a database problem. It is an issue with filter_var, I guess. If I do not use any filter options it works as expected. https://code.sololearn.com/wehJ066Zm5Yj/?ref=app
5th Sep 2020, 9:22 AM
Ore
Ore - avatar
+ 1
yeah, at first i though its just an encoding problem. turns out its what sanitation does..
5th Sep 2020, 9:56 AM
Rei
Rei - avatar
+ 1
Rei After some research I think the culprit is FILTER_FLAG_ENCODE_HIGH. It seems to be a known problem but I can't find a solution that works. https://stackoverflow.com/questions/14739110/php-filter-var-filter-flag-encode-high https://stackoverflow.com/questions/38363566/trouble-with-utf-8-characters-what-i-see-is-not-what-i-stored (I am using Doctrine 2 also)
5th Sep 2020, 10:01 AM
Ore
Ore - avatar
+ 1
Rei I don't understand. Do you mean that those characters are too high to be encoded as UTF-8? 'ß', a valid german character has a decimal value of &#223 which is quite high. Is that why it is split has &#195, &#159, &#194? If so is the encoding process reversible. Is there a way to decode it?
5th Sep 2020, 10:53 AM
Ore
Ore - avatar
+ 1
Rei mb_convert_encoding? I will give it a try.
5th Sep 2020, 11:58 AM
Ore
Ore - avatar
+ 1
Working with unicode is frustrating when using PHP. 🤒
5th Sep 2020, 12:17 PM
Ore
Ore - avatar