+ 14
Why is += so fast with strings?
In my code down there I compare the performance between adding a letter from an original string to: a.) a new string, and b.) a list. I expected the list version to be quicker, because letter by letter is appended, while with string, an immutable type, the whole string would have to be reproduced each time. Funnily, the string version is quicker! Anybody knows why that is? https://code.sololearn.com/chssPtx1eQ6W/?ref=app
23 ответов
+ 13
I wasn't actually able to get a solution for this, but if it might help, I tried playing with the numbers a bit. I multiplied the string by 100 (I also removed one 0 from the number argument in timeit or else it took too much time). It revealed that for large strings, += takes much longer that list appending. Here is the screenshot of the results (tested the code 10 times)
https://github.com/ak-gupta02/String-conc-vs-List-appending/blob/master/Screenshot_20200620-115949.png
(requesting desktop site is recommended)
And this is the code I used
https://code.sololearn.com/cwEY8aR7L04X/?ref=app
+ 9
Operations done by symbols seem to be faster than operations made by named functions.
This is a conclusion made by comparing ** and pow, where ** was 40 times faster than pow.
+ 9
Does this this help?:
https://www.python.org/doc/essays/list2str/
+ 7
I analyzed a bit.
Have fun
https://code.sololearn.com/coxM6um4PsKU/?ref=app
+ 6
XXX, okay, that's what happens when you only know Python on the surface. 😂
So when the string is too short for it to matter, I'm probably rather comparing the methods's overhead instead of the actual difference in data copying, hm?
As soon as the string gets longer, the picture looks quite differently.
Thank you, I'll keep that in mind for my next messing-around with timeit!
Funny side note though:
If I add a ''.join call which I would have to to get an actual string, the list version falls behind again, even with big numbers. 😶
+ 6
https://www.sololearn.com/post/444501/?ref=app
now also list +=🤔🤔🤔🤔
+ 4
Are you sure the functions are actually being called?
Just had a look at python docs...and this is how they show how to give the timeit module access to functions you define...
print(timeit(stmt="f()", number=100000, globals=globals()))
print(timeit(stmt="g()", number=100000, globals=globals()))
# or
print(timeit(stmt="f()", setup="from __main__ import f", number=100000))
print(timeit(stmt="g()", setup="from __main__ import g", number=100000))
..this shows list is slightly quicker.
edit...well....list was faster on my pc using pycharm...but not conclusive in glayground.
+ 4
HonFu yes...
but what I see is inplace add.
🤔🤔🤔
surprisibg.
+ 4
HonFu I spent some time trying (very hard) to look at the source code of ''.join I found from somewhere. And though I understood almost nothing, I understood enough that iteration on the iterable is done. This bit was enough to tell me that
for (i = 0; i < seqlen; i++) {
size_t add_sz;
item = items[i];
if (!PyUnicode_Check(item)) {
PyErr_Format(PyExc_TypeError, "sequence item %zd: expected str instance," " %.80s found", i, Py_TYPE(item)->tp_name);
goto onError;
}
So one thing's for sure, that Python iterates through the iterable, converts each to a string (because we can see the exact TypeError ''.join gives) and probably adds to the final string.
Edit:
Here's the link
https://github.com/python/cpython/blob/master/Objects/unicodeobject.c
Just use 'Find in page' to go to the 2nd occurrence of '_PyUnicode_JoinArray'
+ 2
Thanks so far, everybody, but as far as I see, none of the links really explains *how* string concatenation of increasingly large strings can end up being faster than appending only single letters to a list.
Or it's in there and I haven't understood it, I'll not dismiss that possibility altogether. ;)
It was all good general reads about optimization.
Seb TheS, yeah, using the built-in stuff is usually optimized and runs quicker than had you written it in pure Python.
However, the returning of a new string each time ('a', 'ab', 'abc' etc.) *has* to happen anyway, doesn't it? 🤔
rodwynnejones, you can save yourself a lot of research work with a little trying:
Just add a print statement to one of my functions (reduce number of calls) and see if it's executed. 😉
+ 2
HonFu it's evident that ''.join will take more time, because not only does it have to make a list first, but also iterate through the list again and make a string.
+ 2
Okay, I can't really read that, Oma Falk - guess I'll have to check out dis more closely to get a better idea of what's going on, hm?
+ 2
Okay, that makes sense. So we're actually not any wiser, looking at dis, hm?
That lid would also have to go off. 😅
+ 1
Oma Falk, might that mean that str only 'behaves' as if it was immutable?
(That's what you get, looking into the motor room.)
+ 1
I have no idea who had this brilliant idea of adding a string concatenation function to Python then go ahead and add an operator that does the same thing. However,.…
The way Python handles strings is quite unique, it stores all the attributes of a string with it like size, start and end. Now, strings are immutable(whatsoever that means) but they are also not so immutable to their own class methods like join. What join does is the sane thing and copies the string given to it right after the end and then changes the end(efficient C Style). The adding assignment uses a __add__overload and __eq__ overload which creates a longer path, uses some really slower external assignment and reconstructs "parts" of the string again in the list and store them. That makes the adding assignment, which you might have guessed already, slower and really REALLY tempting to use.
Which explains Oma Falk 's output, which is a gross approximation of the instructions of the Python VM/Interpreter.
+ 1
HonFu That INPLACE_ADD is the external assignment with concatenation and the pain in the bottom.
+ 1
Short Version.
For str in Python += is join work the same and works all C Style.(without C Style Strings, of course)
For list in Python += is a slow operator overloading.
Efficient class method is faster than slow operator overloading.
+ 1
Built-in types and other entities faster than user defined, and its interpreter implementation dependant (cpython, etc)
Think its the way but my knowledge of python is shorty
also it depends of SL server so how trusty is that timming measuring?