+ 4
Which logic python is using for sorting sets?
Next code puts 544 in the begining of set. Other numbers sorted correctly. Why 544 number in the begining? https://code.sololearn.com/cC7mF6171r9x/?ref=app
21 ответ
+ 8
My guess is that the elements in a set are ordered by their hash values (plus some tricks in case of collisions). Probably the hash value of 544 happens to be lower than that of 1, so it appears at the front.
More on hashing: https://www.sololearn.com/learn/668/?ref=app
+ 9
Kishalaya Saha well... it depends
universal hashing is a randomized algorithm🤔
https://en.m.wikipedia.org/wiki/Universal_hashing
but I'm also curious what decides the set order🤔
+ 8
it already surprised me that it gets the same result everytime
what I've test even get different results for the same set😥
https://code.sololearn.com/cPAtxp9XJ3sX
+ 8
The built-in hash function in python generates different hashes for the same thing for different runtimes. Try running `print(hash("wyahah"))` a few times, you'll get different results. So that may be the case for Flandre Scarlet 's code, it uses the hash function internally which produces different values for different runtimes so you get c a sometimes and a c some other times
+ 8
Mayur Garg, Rugved Modak, try Flandre and Yash's examples. Different outputs for the same input!
Flandre Scarlet what I meant by not using randomness is that the result of applying a fixed hash function on a given number should always be the same (so it's really a function in the true sense).
But universal hashing chooses a random hash function from a family of them, right? That's pretty cool!! That could explain your and Yash Kandalkar 's examples too.
Summary: it's definitely not the hash function that's provided by python, because, hash(1) < hash(543) < hash(544) but print({1, 543, 544}) gives {544, 1, 543}.
It still *could* be that a different (universal) hashing is used for sets and dictionary, and that is how they order things.
The mystery remains.
+ 6
Sets in Python are not meant to be sorted. If you want a sorted collection, you use lists instead.
What you can do is create a temporary list when you print your set:
for n in sorted (my_set):
print(n)
+ 6
Sets in python are unsorted as they are used to avoid duplicates. Hence hashing is used to store data in sets. Hence printing out sets will print an unsorted sequence of items much like as in a dictionary. The order doesn't depend on the order of insertion of items, nor on the their values but on hash values which we don't know.
Using the sorted() function returns a list(not a set) when used on a set. You can verify this by using
a=set()
print(type(sorted(a)))
This is why Kishalaya Saha 's
list(s) == sorted(s) is true as there is an identical list on either side. (And not a set).
The fact that 544 is unordered and the rest are ordered is merely a characteristic of the hashing function. If you add 101 to the set in your code, 101 too will be unordered.
+ 6
[On HASHES] Algorithms vary by object ("sets with mixed keys" ref'd link 5)
{link 1} "...the hash of a number x is based on the reduction of x modulo the prime P = 2**_PyHASH_BITS - 1. It's designed so that hash(x) == hash(y) whenever x and y are numerically equal, even if x and y have different types...quick summary: First define..."
{2} for tupleHash
{3} "...str, bytes and datetime objects are 'salted'...constant [per process]...not predictable between [runs]...protection against DoS {4}"
{5} List of algorithms and intended purposes
1: https://github.com/JUMP_LINK__&&__python__&&__JUMP_LINK/cpython/blob/master/Python/pyhash.c#L34
2: https://github.com/python/cpython/blob/master/Objects/tupleobject.c#L348
3: https://docs.python.org/3.5/reference/datamodel.html#object.__hash__
4: Multiple implementations denial-of-service via hash algorithm collision:
http://www.ocert.org/advisories/ocert-2011-003.html
5: Secure and interchangeable hash algorithm:
https://www.python.org/dev/peps/pep-0456/
+ 5
Anna it's possible that hash function is nice enough to respect the ordering for small numbers. BUT... I just tried this
s = set(range(1, 1000))
print(list(s) == sorted(s))
And the output was True. So it seems the set would put the numbers in the right order, even though 544 is included in the range!
Now I no longer think the ordering depends (only) on the hash value. Maybe someone else has a better idea?
+ 5
Kishalaya Saha list comparison only get true when each elements of both equals
+ 5
Kishalaya Saha maybe the hash function make a range(1,1000) occasionally(?) same as a sorted list, but it usually won't be like a sorted list🤔
+ 5
i found that the hash value for int is just the int itself
https://code.sololearn.com/cXlZgZ1az374
+ 4
Kishalaya Saha Wouldn't it be an extremely bad hash value method if the values are almost in perfect order except for one item? 🤔
+ 4
i dont know if this has been said but if not i would say they are sorted like this somehow....
https://code.sololearn.com/cJBPWKOU5ZvS/?ref=app
+ 4
Maybe this Code is interesting for your Discussion:
https://code.sololearn.com/cKU78zsBb1Y2/?ref=app
+ 3
Mayur Garg I know sorted(s) produces a list, which is precisely why I used it. Try the original example:
s = {1,5,87,6,5,4,2,544,16}
print(list(s) == sorted(s))
Output is False. How does your theory explain that?
+ 3
Flandre Scarlet I know. But "order in sets depends only on hash value" doesn't explain the why it gives false for s = {1,5,87,6,5,4,2,544,16} but true for s = set(range(1, 1000))
It's not clear to me why the hash value of 544 would be <= that of 1 in one example, but >= in the other. A hash function shouldn't use randomness!
+ 3
Kishalaya Saha simple hashing depends on number of elements, doesn't it? That could change indexing factor.
+ 3
Kishalaya Saha As to why values should determine the hashing used, I have something to add to that too.
Consider a set containing values 1 to 100
And a simple hashing function using mod.
So to store values in this set, let us say python uses mod 100 to store values. That will use up all indexes from 0 to 99 keeping the hash function more optimised.
But when we add more random values, using 'mod n' will result in more empty hash values in between causing the hashing to become less optimised. Hence python uses a different hashing mechanism to create a more compact hash.
Again, just a speculation.
+ 2
Mbrustler that's a very good guess, but doesn't appear to be correct (or the whole story).Try this:
print({1, 543, 544}) # {544, 1, 543}
print(id(544) <= id(1)) # True
print(id(1) <= id(543)) # False