+ 4

Which logic python is using for sorting sets?

Next code puts 544 in the begining of set. Other numbers sorted correctly. Why 544 number in the begining? https://code.sololearn.com/cC7mF6171r9x/?ref=app

23rd Oct 2018, 11:17 AM
Horman Lewis
Horman Lewis - avatar
21 Answers
+ 8
My guess is that the elements in a set are ordered by their hash values (plus some tricks in case of collisions). Probably the hash value of 544 happens to be lower than that of 1, so it appears at the front. More on hashing: https://www.sololearn.com/learn/668/?ref=app
23rd Oct 2018, 11:45 AM
Kishalaya Saha
Kishalaya Saha - avatar
+ 9
Kishalaya Saha well... it depends universal hashing is a randomized algorithm🤔 https://en.m.wikipedia.org/wiki/Universal_hashing but I'm also curious what decides the set order🤔
23rd Oct 2018, 2:00 PM
Flandre Scarlet
Flandre Scarlet - avatar
+ 8
it already surprised me that it gets the same result everytime what I've test even get different results for the same set😥 https://code.sololearn.com/cPAtxp9XJ3sX
23rd Oct 2018, 1:23 PM
Flandre Scarlet
Flandre Scarlet - avatar
+ 8
The built-in hash function in python generates different hashes for the same thing for different runtimes. Try running `print(hash("wyahah"))` a few times, you'll get different results. So that may be the case for Flandre Scarlet 's code, it uses the hash function internally which produces different values for different runtimes so you get c a sometimes and a c some other times
23rd Oct 2018, 2:11 PM
Yash✳️
Yash✳️ - avatar
+ 8
Mayur Garg, Rugved Modak, try Flandre and Yash's examples. Different outputs for the same input! Flandre Scarlet what I meant by not using randomness is that the result of applying a fixed hash function on a given number should always be the same (so it's really a function in the true sense). But universal hashing chooses a random hash function from a family of them, right? That's pretty cool!! That could explain your and Yash Kandalkar 's examples too. Summary: it's definitely not the hash function that's provided by python, because, hash(1) < hash(543) < hash(544) but print({1, 543, 544}) gives {544, 1, 543}. It still *could* be that a different (universal) hashing is used for sets and dictionary, and that is how they order things. The mystery remains.
23rd Oct 2018, 2:28 PM
Kishalaya Saha
Kishalaya Saha - avatar
+ 6
Sets in Python are not meant to be sorted. If you want a sorted collection, you use lists instead. What you can do is create a temporary list when you print your set: for n in sorted (my_set): print(n)
23rd Oct 2018, 11:29 AM
HonFu
HonFu - avatar
+ 6
Sets in python are unsorted as they are used to avoid duplicates. Hence hashing is used to store data in sets. Hence printing out sets will print an unsorted sequence of items much like as in a dictionary. The order doesn't depend on the order of insertion of items, nor on the their values but on hash values which we don't know. Using the sorted() function returns a list(not a set) when used on a set. You can verify this by using a=set() print(type(sorted(a))) This is why Kishalaya Saha 's list(s) == sorted(s) is true as there is an identical list on either side. (And not a set). The fact that 544 is unordered and the rest are ordered is merely a characteristic of the hashing function. If you add 101 to the set in your code, 101 too will be unordered.
23rd Oct 2018, 1:28 PM
Mayur Garg
Mayur Garg - avatar
+ 6
[On HASHES] Algorithms vary by object ("sets with mixed keys" ref'd link 5) {link 1} "...the hash of a number x is based on the reduction of x modulo the prime P = 2**_PyHASH_BITS - 1. It's designed so that hash(x) == hash(y) whenever x and y are numerically equal, even if x and y have different types...quick summary: First define..." {2} for tupleHash {3} "...str, bytes and datetime objects are 'salted'...constant [per process]...not predictable between [runs]...protection against DoS {4}" {5} List of algorithms and intended purposes 1: https://github.com/JUMP_LINK__&&__python__&&__JUMP_LINK/cpython/blob/master/Python/pyhash.c#L34 2: https://github.com/python/cpython/blob/master/Objects/tupleobject.c#L348 3: https://docs.python.org/3.5/reference/datamodel.html#object.__hash__ 4: Multiple implementations denial-of-service via hash algorithm collision: http://www.ocert.org/advisories/ocert-2011-003.html 5: Secure and interchangeable hash algorithm: https://www.python.org/dev/peps/pep-0456/
23rd Oct 2018, 4:34 PM
Kirk Schafer
Kirk Schafer - avatar
+ 5
Anna it's possible that hash function is nice enough to respect the ordering for small numbers. BUT... I just tried this s = set(range(1, 1000)) print(list(s) == sorted(s)) And the output was True. So it seems the set would put the numbers in the right order, even though 544 is included in the range! Now I no longer think the ordering depends (only) on the hash value. Maybe someone else has a better idea?
23rd Oct 2018, 12:26 PM
Kishalaya Saha
Kishalaya Saha - avatar
+ 5
Kishalaya Saha list comparison only get true when each elements of both equals
23rd Oct 2018, 1:39 PM
Flandre Scarlet
Flandre Scarlet - avatar
+ 5
Kishalaya Saha maybe the hash function make a range(1,1000) occasionally(?) same as a sorted list, but it usually won't be like a sorted list🤔
23rd Oct 2018, 1:47 PM
Flandre Scarlet
Flandre Scarlet - avatar
+ 5
i found that the hash value for int is just the int itself https://code.sololearn.com/cXlZgZ1az374
23rd Oct 2018, 2:17 PM
Flandre Scarlet
Flandre Scarlet - avatar
+ 4
Kishalaya Saha Wouldn't it be an extremely bad hash value method if the values are almost in perfect order except for one item? 🤔
23rd Oct 2018, 12:10 PM
Anna
Anna - avatar
+ 4
i dont know if this has been said but if not i would say they are sorted like this somehow.... https://code.sololearn.com/cJBPWKOU5ZvS/?ref=app
23rd Oct 2018, 12:36 PM
Mbrustler
Mbrustler - avatar
+ 4
Maybe this Code is interesting for your Discussion: https://code.sololearn.com/cKU78zsBb1Y2/?ref=app
8th Nov 2018, 12:46 PM
Sebastian Keßler
Sebastian Keßler - avatar
+ 3
Mayur Garg I know sorted(s) produces a list, which is precisely why I used it. Try the original example: s = {1,5,87,6,5,4,2,544,16} print(list(s) == sorted(s)) Output is False. How does your theory explain that?
23rd Oct 2018, 1:37 PM
Kishalaya Saha
Kishalaya Saha - avatar
+ 3
Flandre Scarlet I know. But "order in sets depends only on hash value" doesn't explain the why it gives false for s = {1,5,87,6,5,4,2,544,16} but true for s = set(range(1, 1000)) It's not clear to me why the hash value of 544 would be <= that of 1 in one example, but >= in the other. A hash function shouldn't use randomness!
23rd Oct 2018, 1:51 PM
Kishalaya Saha
Kishalaya Saha - avatar
+ 3
Kishalaya Saha simple hashing depends on number of elements, doesn't it? That could change indexing factor.
23rd Oct 2018, 2:05 PM
Rugved Modak
Rugved Modak - avatar
+ 3
Kishalaya Saha As to why values should determine the hashing used, I have something to add to that too. Consider a set containing values 1 to 100 And a simple hashing function using mod. So to store values in this set, let us say python uses mod 100 to store values. That will use up all indexes from 0 to 99 keeping the hash function more optimised. But when we add more random values, using 'mod n' will result in more empty hash values in between causing the hashing to become less optimised. Hence python uses a different hashing mechanism to create a more compact hash. Again, just a speculation.
23rd Oct 2018, 2:07 PM
Mayur Garg
Mayur Garg - avatar
+ 2
Mbrustler that's a very good guess, but doesn't appear to be correct (or the whole story).Try this: print({1, 543, 544}) # {544, 1, 543} print(id(544) <= id(1)) # True print(id(1) <= id(543)) # False
23rd Oct 2018, 12:48 PM
Kishalaya Saha
Kishalaya Saha - avatar