+ 2
C# lexicographic string comparison not working as expected
Here is a code I made: https://sololearn.com/compiler-playground/cEkrQSBZFCQr/?ref=app What makes the two expressions different? Edit (Solution): Have a look at the bottom most code. https://sololearn.com/compiler-playground/cuba9uC7vK1H/?ref=app
19 odpowiedzi
+ 1
@Calvin Jude
I noticed a difference when calling one of the String.Compare() overloads which accepts StringComparison as third argument
Console.WriteLine(String.Compare("a", "A", StringComparison.Ordinal));
Now this, results in positive integer, 32 where I tested, like one of the results from string::compare() from C++
As I was reading (not throroughly understanding) this page
https://learn.microsoft.com/en-us/dotnet/standard/base-types/best-practices-strings
I saw they mentioned something about linguistic comparison (if I get it right) which may result in difference of outcome for a call to string comparison method.
I'm not 100% sure, as the examples showed there involved the use of different settings of culture/globalisation setup, while here, only one character length strings were compared against.
+ 2
@Calvin Jude
Nice formatting, now we can see which characters are considered equal in weight
Though I'm not sure what conclusion can be drawn still, nor what caused the "weird" behaviour
And please, no need to mark my response, it didn't disclose anything. It was an observation
If you know Java as good as you do C#, it might also be as interesting to investigate whether such behaviour exists in Java as well, since C# is conceptually Microsoft's vision of Java :)
+ 1
let me see....
ASCII
A = 65
a = 97
is 97 bigger then 65?
True
let me see the second one...
return type of this method is always integer:
less than zero: if the first string is lexicographically smaller than the second string;
zero: if both strings are equal;
greater than zero: if the first string is lexicographically greater than the second string;
I hope this helps
+ 1
Calvin Jude
is 'a' less than 'A'?
// 'a' < 'A'
no, so gives the negative value
+ 1
Interesting! it's a wonder though, because such attempt for comparison always results in positive value in C using strcmp(), and C++ using std::string::compare().
#include <iostream>
#include <string>
#include <cstring> // strcmp
int main(int argc, char *argv[])
{
using std::string, std::cout;
string first { "a" }, second { "A" };
// using string::compare()
cout << string{ "a" }.compare( string{ "A" } ) << '\n';
cout << first.compare( "A" ) << '\n';
cout << first.compare( second ) << '\n';
cout << '\n';
// using strcmp()
cout << strcmp( "a", "A" ) << '\n';
cout << strcmp( first.c_str(), second.c_str() ) << '\n';
return 0;
}
+ 1
"So it seems like kind of a bug to me? I've seen the opposite behavior in cpp too."
I'm also curious, but let's not to run into conclusions here, let's hear what others has to say first...
"Guess I've to ensure the strings are in lowercase first."
But why?
+ 1
Ipang Oh, so there's a third argument you could give to String.Compare. The code is a mess, but it's the output that I want you to see:
https://sololearn.com/compiler-playground/cR0xAu4dwF0b/?ref=app
+ 1
@Calvin Jude
"Really weird behavior. I guess all of the ones apart from Ordinal are only slightly different, the different appearing only when more unicode chars are present (or according to specific combinations of letters). Interesting."
Indeed, weird and yet still very interesting. So it shows, how 'a' weighs less than 'A' except where StringComparison.Ordinal was used for String.Compare() argument.
I was rather surprised to see the output for the StringComparison.OrdinalIgnoreCase. I was expecting something like this...
"...< A < a < B < b..."
But instead I see...
"...< a < A < b < B..."
Really wish some C# master can shed a bit of light now, this had began a confusion :)
+ 1
Ipang I'm really sorry to say this, but my code didn't represent the true result of the comparison; I forgot to involve the equality operator. Please check the code once again, I've edited it to meet the needs. You can truly see how OrdinalIgnoreCase is different from Ordinal and Culture
+ 1
Ipang I considered marking my own answer as the best, as *technically* that's what really answers the question. But I never would have done it without your observation. Thank you for helping me out! This was a worthy learning journey indeed
+ 1
@Calvin Jude
TBH I'm still puzzled, still not getting a clear view
Would appreciate it if you could edit the post's Description, adding a bit of what you concluded (I presume you had) from this discussion. Future visitors might find that useful. And post Description is the earliest to come into view for future visitors, no need to scroll pages unless they choose to see how it came to chronologically :)
+ 1
Ipang I've edited my question. Seems like InvariantCulture groups stuff according to a set standard (a < A < accented a < b), a much difficult comparison than what Ordinal does. CurrentCulture refers to the comparison offered by your local machine, other than simple UNICODE comparison. You could make a lot of inferences from the output (if run in a place other than here). Sololearn apparently isn't printing accents.
0
Mihaly Nyilas But the second statement gives me a negative value, implying the first string is less than the second, exactly the opposite result of the char comparison. Why?
0
Mihaly Nyilas
String.Compare("a", "b") also evaluates to -1. This means that lowercase "a" with an ASCII of 97 is ranked below "A" (65). Where's the anomaly?
0
is 97 less than 98?
// 97 < 98
try these as well:
String.Compare("b", "a");
String.Compare("A, "a");
both should give 1
0
Ipang So it seems like kind of a bug to me? I've seen the opposite behavior in cpp too.
0
Guess I've to ensure the strings are in lowercase first.
0
It seems like it takes a lot longer to get a C# query answered; users who use C# seem to be less here.
0
Really weird behavior. I guess all of the ones apart from Ordinal are only slightly different, the different appearing only when more unicode chars are present (or according to specific combinations of letters). Interesting.