+ 2

C# lexicographic string comparison not working as expected

Here is a code I made: https://sololearn.com/compiler-playground/cEkrQSBZFCQr/?ref=app What makes the two expressions different? Edit (Solution): Have a look at the bottom most code. https://sololearn.com/compiler-playground/cuba9uC7vK1H/?ref=app

8th Jul 2024, 7:41 PM
Calvin Jude
Calvin Jude - avatar
19 odpowiedzi
+ 1
@Calvin Jude I noticed a difference when calling one of the String.Compare() overloads which accepts StringComparison as third argument Console.WriteLine(String.Compare("a", "A", StringComparison.Ordinal)); Now this, results in positive integer, 32 where I tested, like one of the results from string::compare() from C++ As I was reading (not throroughly understanding) this page https://learn.microsoft.com/en-us/dotnet/standard/base-types/best-practices-strings I saw they mentioned something about linguistic comparison (if I get it right) which may result in difference of outcome for a call to string comparison method. I'm not 100% sure, as the examples showed there involved the use of different settings of culture/globalisation setup, while here, only one character length strings were compared against.
9th Jul 2024, 10:24 AM
Ipang
+ 2
@Calvin Jude Nice formatting, now we can see which characters are considered equal in weight Though I'm not sure what conclusion can be drawn still, nor what caused the "weird" behaviour And please, no need to mark my response, it didn't disclose anything. It was an observation If you know Java as good as you do C#, it might also be as interesting to investigate whether such behaviour exists in Java as well, since C# is conceptually Microsoft's vision of Java :)
12th Jul 2024, 2:07 AM
Ipang
+ 1
let me see.... ASCII A = 65 a = 97 is 97 bigger then 65? True let me see the second one... return type of this method is always integer: less than zero: if the first string is lexicographically smaller than the second string; zero: if both strings are equal; greater than zero: if the first string is lexicographically greater than the second string; I hope this helps
8th Jul 2024, 8:01 PM
Mihaly Nyilas
Mihaly Nyilas - avatar
+ 1
Calvin Jude is 'a' less than 'A'? // 'a' < 'A' no, so gives the negative value
9th Jul 2024, 4:20 AM
Mihaly Nyilas
Mihaly Nyilas - avatar
+ 1
Interesting! it's a wonder though, because such attempt for comparison always results in positive value in C using strcmp(), and C++ using std::string::compare(). #include <iostream> #include <string> #include <cstring> // strcmp int main(int argc, char *argv[]) { using std::string, std::cout; string first { "a" }, second { "A" }; // using string::compare() cout << string{ "a" }.compare( string{ "A" } ) << '\n'; cout << first.compare( "A" ) << '\n'; cout << first.compare( second ) << '\n'; cout << '\n'; // using strcmp() cout << strcmp( "a", "A" ) << '\n'; cout << strcmp( first.c_str(), second.c_str() ) << '\n'; return 0; }
9th Jul 2024, 8:26 AM
Ipang
+ 1
"So it seems like kind of a bug to me? I've seen the opposite behavior in cpp too." I'm also curious, but let's not to run into conclusions here, let's hear what others has to say first... "Guess I've to ensure the strings are in lowercase first." But why?
9th Jul 2024, 8:58 AM
Ipang
+ 1
Ipang Oh, so there's a third argument you could give to String.Compare. The code is a mess, but it's the output that I want you to see: https://sololearn.com/compiler-playground/cR0xAu4dwF0b/?ref=app
10th Jul 2024, 4:42 PM
Calvin Jude
Calvin Jude - avatar
+ 1
@Calvin Jude "Really weird behavior. I guess all of the ones apart from Ordinal are only slightly different, the different appearing only when more unicode chars are present (or according to specific combinations of letters). Interesting." Indeed, weird and yet still very interesting. So it shows, how 'a' weighs less than 'A' except where StringComparison.Ordinal was used for String.Compare() argument. I was rather surprised to see the output for the StringComparison.OrdinalIgnoreCase. I was expecting something like this... "...< A < a < B < b..." But instead I see... "...< a < A < b < B..." Really wish some C# master can shed a bit of light now, this had began a confusion :)
10th Jul 2024, 7:49 PM
Ipang
+ 1
Ipang I'm really sorry to say this, but my code didn't represent the true result of the comparison; I forgot to involve the equality operator. Please check the code once again, I've edited it to meet the needs. You can truly see how OrdinalIgnoreCase is different from Ordinal and Culture
11th Jul 2024, 5:32 AM
Calvin Jude
Calvin Jude - avatar
+ 1
Ipang I considered marking my own answer as the best, as *technically* that's what really answers the question. But I never would have done it without your observation. Thank you for helping me out! This was a worthy learning journey indeed
12th Jul 2024, 9:02 AM
Calvin Jude
Calvin Jude - avatar
+ 1
@Calvin Jude TBH I'm still puzzled, still not getting a clear view Would appreciate it if you could edit the post's Description, adding a bit of what you concluded (I presume you had) from this discussion. Future visitors might find that useful. And post Description is the earliest to come into view for future visitors, no need to scroll pages unless they choose to see how it came to chronologically :)
12th Jul 2024, 5:23 PM
Ipang
+ 1
Ipang I've edited my question. Seems like InvariantCulture groups stuff according to a set standard (a < A < accented a < b), a much difficult comparison than what Ordinal does. CurrentCulture refers to the comparison offered by your local machine, other than simple UNICODE comparison. You could make a lot of inferences from the output (if run in a place other than here). Sololearn apparently isn't printing accents.
16th Jul 2024, 3:58 PM
Calvin Jude
Calvin Jude - avatar
0
Mihaly Nyilas But the second statement gives me a negative value, implying the first string is less than the second, exactly the opposite result of the char comparison. Why?
9th Jul 2024, 3:21 AM
Calvin Jude
Calvin Jude - avatar
0
Mihaly Nyilas String.Compare("a", "b") also evaluates to -1. This means that lowercase "a" with an ASCII of 97 is ranked below "A" (65). Where's the anomaly?
9th Jul 2024, 7:48 AM
Calvin Jude
Calvin Jude - avatar
0
is 97 less than 98? // 97 < 98 try these as well: String.Compare("b", "a"); String.Compare("A, "a"); both should give 1
9th Jul 2024, 8:20 AM
Mihaly Nyilas
Mihaly Nyilas - avatar
0
Ipang So it seems like kind of a bug to me? I've seen the opposite behavior in cpp too.
9th Jul 2024, 8:43 AM
Calvin Jude
Calvin Jude - avatar
0
Guess I've to ensure the strings are in lowercase first.
9th Jul 2024, 8:44 AM
Calvin Jude
Calvin Jude - avatar
0
It seems like it takes a lot longer to get a C# query answered; users who use C# seem to be less here.
9th Jul 2024, 9:25 AM
Calvin Jude
Calvin Jude - avatar
0
Really weird behavior. I guess all of the ones apart from Ordinal are only slightly different, the different appearing only when more unicode chars are present (or according to specific combinations of letters). Interesting.
10th Jul 2024, 4:43 PM
Calvin Jude
Calvin Jude - avatar