+ 6
* CODER CHALLENGE *: 35% easy alpha strings compression
There are only 38 alpha signs (26 lowercase alphabetical, numbers, " " and "."). But a character can take 256 values. So a string of 21 characters can take 256^21 values. 32 alphas can fit in this string because 256^21 <~ 38^32. Lets use this to do fast but powerful compression ! Write 3 fonctions : 1) compress(x) taking a string x of length L including only alpha signs and returning a string c of length 21/32×L "compressing" x. 2) uncompress(c) that does the opposite 3) test(x) which demonstrates that this works ! NB: It is possible, contrary to what some say hereunder ;-)
28 Réponses
+ 7
Thanks goodness that was clarified 'cause man I couldn't figure these previous conditions out. Anyway, here's my try
https://code.sololearn.com/cebJBx76Mg0x/?ref=app
+ 16
You got my attention, but unfortunately your assumptions are wrong:
1. You can't represent text with 38 chars - that means only lower or uppercase chars, no punctuation except "."
2. Computers do not use decimal, but binary - to represent 256 chars you need 8 bits, to represent 38 - 6 bits. Actually with those 6 bits you can store up to 64 chars, but still the max compression will be (8-6)/8*100% which is just 25%
If you are interested in string compression, this is a good place to start:
https://en.wikipedia.org/wiki/Huffman_coding
+ 8
@VcC - I agree with 35% I am just curious what Huffman could achieve compared to Base38 conversion, would give it a try later 😃
+ 6
Interesting~ We demand test case with examples! 😆
+ 6
Hi @Nikolay, I'm not an expert but I believe you are thinking strictely in a characterset way.. but if you use a string that rappresents only the set of alphabetical characters the idea would be right..
Now I cannot explain my idea with an example but as soon as possible I would be back.. :)
+ 4
Interesring insight @Nikolay.
What actually attracts you @Nikolay? 😂
+ 3
x="this is an example of string to compress largely thanks to this very simple method. guess how long the compressed version will be..... my friends note that if you do the math this 256 characters string should reduce to a 168 characters compressed string..."
+ 3
https://code.sololearn.com/cjXsBYqn4x9Q/?ref=app code need some work but i think it works
+ 3
@nikolai I dont agree ! There was indeed a typo (compression ratio 35% not 85%).This is lowercase only in this challenge (would apply for example to compress a dictionnary or texts where just the first letter is always uppercase). The challenge is about compressing with a simple method doable but beginners : you dont need huffman if what you want to compress have obvious redundancies like using only 38 positions in a 256 positions set ;-) .
And you can be smarter than just using more bits that you need (think about packing a 38 alphas string rather than just trying to fit each in more bits than you need).
Agree that you compress less is you need more character but this is ot what this challenge is about !
+ 3
Hard to read on mobile, you mean? The code should be as pep8 compliant as possible (I think, I hope). Also, changed the sample text to a non-mod32 string long, check out please if the output is still as expected.
+ 3
Note that this method can be used whenever you are only using a limited number of characters. See this code to calculate the compression ratio guaranteed, and the length of the strings you need to use to reach the ratio.
https://code.sololearn.com/cO0lFn001sqX
+ 3
@VcC fixed, working with 6bit encoding. plus I'll work for one of 5bit :) so the charset will be with ronmans numbers, so 26 lowercase \s . does 28, plus "R,N" meaning number roman, like 26 = RNxxvi
28+R+N=30 fixing in 5bits :)
+ 2
Interesting. Can you add a test function doing uncompress(compress(x)) and showing the length of both strings with the test string x of length 256 ?
+ 2
https://code.sololearn.com/W39OboSy879P/?ref=app
Now, it's working, for 6bits, and with some details for 5bits
+ 2
ready 👌
+ 1
Great one ! Works like a charm. Great use of python to produce a short code (not so easy to read though). Improvement could be to deal with padding zeros for strings with a size not multiple of 31...
+ 1
The code is ultrasimple, very fast, and need no overhead/dictionnary. Beats any compression for short texts. Good solution when you use only some signes in the dictionary :-). And great way to understand compression in a simple case.
0
Note that the compressed string will probably wont be printable.
0
@Michael totally !
0
@Edgar Good use of lambda functions. This is a good example where python helps solving quickly in a couple of lines processing that would take much more dev time with other languages. Note that you perform a n^2 complexity (multiplication / division involving a number of n bytes takes a time n).
Anybody to provide a faster version ?