+ 6

* CODER CHALLENGE *: 35% easy alpha strings compression

There are only 38 alpha signs (26 lowercase alphabetical, numbers, " " and "."). But a character can take 256 values. So a string of 21 characters can take 256^21 values. 32 alphas can fit in this string because 256^21 <~ 38^32. Lets use this to do fast but powerful compression ! Write 3 fonctions : 1) compress(x) taking a string x of length L including only alpha signs and returning a string c of length 21/32×L "compressing" x. 2) uncompress(c) that does the opposite 3) test(x) which demonstrates that this works ! NB: It is possible, contrary to what some say hereunder ;-)

26th Aug 2017, 3:12 PM
VcC
VcC - avatar
28 Réponses
+ 7
Thanks goodness that was clarified 'cause man I couldn't figure these previous conditions out. Anyway, here's my try https://code.sololearn.com/cebJBx76Mg0x/?ref=app
26th Aug 2017, 10:48 AM
Edgar Garrido
Edgar Garrido - avatar
+ 16
You got my attention, but unfortunately your assumptions are wrong: 1. You can't represent text with 38 chars - that means only lower or uppercase chars, no punctuation except "." 2. Computers do not use decimal, but binary - to represent 256 chars you need 8 bits, to represent 38 - 6 bits. Actually with those 6 bits you can store up to 64 chars, but still the max compression will be (8-6)/8*100% which is just 25% If you are interested in string compression, this is a good place to start: https://en.wikipedia.org/wiki/Huffman_coding
26th Aug 2017, 6:49 AM
Nikolay Nachev
Nikolay Nachev - avatar
+ 8
@VcC - I agree with 35% I am just curious what Huffman could achieve compared to Base38 conversion, would give it a try later 😃
26th Aug 2017, 11:51 AM
Nikolay Nachev
Nikolay Nachev - avatar
+ 6
Interesting~ We demand test case with examples! 😆
25th Aug 2017, 9:26 PM
Zephyr Koo
Zephyr Koo - avatar
+ 6
Hi @Nikolay, I'm not an expert but I believe you are thinking strictely in a characterset way.. but if you use a string that rappresents only the set of alphabetical characters the idea would be right.. Now I cannot explain my idea with an example but as soon as possible I would be back.. :)
26th Aug 2017, 7:00 AM
Michael Isac Girardi
Michael Isac Girardi - avatar
+ 4
Interesring insight @Nikolay. What actually attracts you @Nikolay? 😂
26th Aug 2017, 7:40 AM
👑 Prometheus 🇸🇬
👑 Prometheus 🇸🇬 - avatar
+ 3
x="this is an example of string to compress largely thanks to this very simple method. guess how long the compressed version will be..... my friends note that if you do the math this 256 characters string should reduce to a 168 characters compressed string..."
25th Aug 2017, 9:56 PM
VcC
VcC - avatar
+ 3
https://code.sololearn.com/cjXsBYqn4x9Q/?ref=app code need some work but i think it works
26th Aug 2017, 12:05 AM
Izi Lior Katie
Izi Lior Katie - avatar
+ 3
@nikolai I dont agree ! There was indeed a typo (compression ratio 35% not 85%).This is lowercase only in this challenge (would apply for example to compress a dictionnary or texts where just the first letter is always uppercase). The challenge is about compressing with a simple method doable but beginners : you dont need huffman if what you want to compress have obvious redundancies like using only 38 positions in a 256 positions set ;-) . And you can be smarter than just using more bits that you need (think about packing a 38 alphas string rather than just trying to fit each in more bits than you need). Agree that you compress less is you need more character but this is ot what this challenge is about !
26th Aug 2017, 8:23 AM
VcC
VcC - avatar
+ 3
Hard to read on mobile, you mean? The code should be as pep8 compliant as possible (I think, I hope). Also, changed the sample text to a non-mod32 string long, check out please if the output is still as expected.
26th Aug 2017, 11:16 AM
Edgar Garrido
Edgar Garrido - avatar
+ 3
Note that this method can be used whenever you are only using a limited number of characters. See this code to calculate the compression ratio guaranteed, and the length of the strings you need to use to reach the ratio. https://code.sololearn.com/cO0lFn001sqX
26th Aug 2017, 11:23 AM
VcC
VcC - avatar
+ 3
@VcC fixed, working with 6bit encoding. plus I'll work for one of 5bit :) so the charset will be with ronmans numbers, so 26 lowercase \s . does 28, plus "R,N" meaning number roman, like 26 = RNxxvi 28+R+N=30 fixing in 5bits :)
26th Aug 2017, 3:27 PM
ysraelcon
ysraelcon - avatar
+ 2
Interesting. Can you add a test function doing uncompress(compress(x)) and showing the length of both strings with the test string x of length 256 ?
26th Aug 2017, 5:20 AM
VcC
VcC - avatar
+ 2
https://code.sololearn.com/W39OboSy879P/?ref=app Now, it's working, for 6bits, and with some details for 5bits
26th Aug 2017, 1:00 PM
ysraelcon
ysraelcon - avatar
+ 2
ready 👌
26th Aug 2017, 4:38 PM
ysraelcon
ysraelcon - avatar
+ 1
Great one ! Works like a charm. Great use of python to produce a short code (not so easy to read though). Improvement could be to deal with padding zeros for strings with a size not multiple of 31...
26th Aug 2017, 10:53 AM
VcC
VcC - avatar
+ 1
The code is ultrasimple, very fast, and need no overhead/dictionnary. Beats any compression for short texts. Good solution when you use only some signes in the dictionary :-). And great way to understand compression in a simple case.
26th Aug 2017, 11:54 AM
VcC
VcC - avatar
0
Note that the compressed string will probably wont be printable.
25th Aug 2017, 10:25 PM
VcC
VcC - avatar
0
@Michael totally !
26th Aug 2017, 8:29 AM
VcC
VcC - avatar
0
@Edgar Good use of lambda functions. This is a good example where python helps solving quickly in a couple of lines processing that would take much more dev time with other languages. Note that you perform a n^2 complexity (multiplication / division involving a number of n bytes takes a time n). Anybody to provide a faster version ?
26th Aug 2017, 2:44 PM
VcC
VcC - avatar