Python Performance: Why 'if not list' is 2x Faster Than Using len()

abhi9u@lemmy.world · 7 months ago

Python Performance: Why 'if not list' is 2x Faster Than Using len()

thebestaquaman@lemmy.world · 7 months ago

I write a lot of Python. I hate it when people use “X is more pythonic” as some kind of argument for what is a better solution to a problem. I also have a hang up with people acting like python has any form of type safety, instead of just embracing duck typing.This lands us at the following:

The article states that “you can check a list for emptiness in two ways: if not mylist or if len(mylist) == 0”. Already here, a fundamental mistake has been made: You don’t know (and shouldn’t care) whether mylist is a list. These two checks are not different ways of doing the same thing, but two different checks altogether. The first checks whether the object is “falsey” and the second checks whether the object has a well defined length that is zero. These are two completely different checks, which often (but far from always) overlap. Embrace the duck type- type safe python is a myth.

Avicenna@lemmy.world · edit-2 7 months ago

isn’t the expected behaviour exactly identical on any object that has len defined:

“By default, an object is considered true unless its class defines either a bool() method that returns False or a len() method that returns zero, when called with the object.”

ps: well your objection is I guess that we cant know in advance if that said object has len defined such as being a collection so this question does not really apply to your post I guess.

CompassRed@discuss.tchncs.de · 7 months ago

It’s not the same, and you kinda answered your own question with that quote. Consider what happens when an object defines both dunder bool and dunder len. It’s possible for dunder len to return 0 while dunder bool returns True, in which case the falsy-ness of the instance would not depend at all on the value of len

thebestaquaman@lemmy.world · 7 months ago

Exactly as you said yourself: Checking falsieness does not guarantee that the object has a length. There is considerable overlap between the two, and if it turns out that this check is a performance bottleneck (which I have a hard time imagining) it can be appropriate to check for falsieness instead of zero length. But in that case, don’t be surprised if you suddenly get an obscure bug because of some custom object not behaving the way you assumed it would.

I guess my primary point is that we should be checking for what we actually care about, because that makes intent clear and reduces the chance for obscure bugs.

sugar_in_your_tea@sh.itjust.works · 7 months ago

type safe python is a myth

Sure, but type hints provide a ton of value in documenting for your users what the code expects. I use type hints everywhere, and it’s fantastic! Yes, there’s no guarantee that the types are correct, but with static analysis and the assumption that your users want their code to work correctly, there’s a very high chance that the types are correct.

That said, I lie about types all the time. For example, if my function accepts a class instance as an argument, the intention is that the code accept any class that implements the same methods as the one I’ve defined in the parameter list, and you don’t necessarily have to pass an instance of that class in (or one of its sub-classes). But I feel like putting something reasonable in there makes a lot more sense than nothing, and I can clarify in the docstring that I really just need something that looks like that object. One of these days I’ll get around to switching that to Protocol classes to reduce type errors.

That said, I don’t type hint everything. A lot of private methods and private functions don’t have types, because they’re usually short and aren’t used outside the class/file anyway, so what’s the point?

thebestaquaman@lemmy.world · 7 months ago

Type hints are usually great, as long as they’re kept up to date and the IDE interprets them correctly. Recently I’ve had some problems with PyCharm acting up and insisting that matplotlib doesn’t accept numpy arrays, leading me to just disable the type checker altogether.

All in all, I’m a bit divided on type hints, because I’m unsure whether I think the (huge) value added from correct type hints outweighs the frustration I’ve experienced from incorrect type hints. Per now I’m leaning towards “type hints are good, as long as you never blindly trust them and only treat them as a coarse indicator of what some dev thought at some point.”

sugar_in_your_tea@sh.itjust.works · 7 months ago

leading me to just disable the type checker altogether.

The better option is to just put # type: ignore on the statements where it gets confused, and add hints for your code. I’ve done that for SQLAlchemy before they got proper type hinting, and it worked pretty well.

That said, a type hint is just that, a hint. It shouldn’t be relied on to be 100% accurate (i.e. lots of foo: list should actually be foo: list | None), but if you use a decent static analysis tool, you should catch the worst of it. We use pyright, which is built in to the VSCode extension pylance. It works incredibly well, though it’s a bit too strict in many cases (e.g. when things can be None but generally aren’t).

So yeah, never blindly trust type hints, but do use them everywhere. The more hints you have, the more the static analysis can help, and disabling them on a case-by-case basis is incredibly easy. You’ll probably still get some runtime exceptions that correct type checking could have caught, but it’s a lot better than having a bunch of verbose checks everywhere that make no sense. A good companion to type checks is robust unit test cases with reasonable data (i.e. try to exercise the boundaries of what users can input).

As it stands, we very rarely get runtime exceptions due to poor typing because our type hints are generally pretty good and our unit test cases back that up. Don’t blindly trust it, and absolutely read the docs for anything you plan to use, but as long as you are pretty consistent, you can start making some assumptions about what your data looks like.

Sirber@lemmy.ca · edit-2 7 months ago

How does Python know if it’s my list or not?

dblsaiko@discuss.tchncs.de · 7 months ago

Telemetry

JasonDJ@lemmy.zip · 7 months ago

if isinstance(mylist, list) and not mylist

Problem solved.

Or if not mylist # check if list is empty

Sirber@lemmy.ca · 7 months ago

I think you missed the joke 😅

PattyMcB@lemmy.world · 7 months ago

I thought it was funny!

gravitas_deficiency@sh.itjust.works · 7 months ago

You’re checking if mylist is falsey. Sometimes that’s the same as checking if it’s empty, if it’s actually a list, but that’s not guaranteed.

JasonDJ@lemmy.zip · 7 months ago

Doesn’t Python treat all empty iterables as false tho? This isn’t unique to python, is it? (though I’m not a programmer…just a dude who writes scripts every now and then)

gravitas_deficiency@sh.itjust.works · 7 months ago

My point is that the second statement you presented can have the effect of evaluating emptiness of a Sequence (note: distinct from an Iterable), but that only holds true if the target of the conditional IS a sequence. I’m underlining the semantic difference that was elided as a result of falsey evaluation.

JasonDJ@lemmy.zip · 7 months ago

Ok, help a noob out. What is the difference between a sequence and an iterable? Is a sequence immutable, like a tuple?

48954246@lemmy.world · 7 months ago

An iterable is just something that can be iterated over, like range(10), or [1, 2, 3].

A sequence on the other hand is a Collection that is reversible.

https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes

gravitas_deficiency@sh.itjust.works · 7 months ago

I know what an iterable is. But I am talking about Type[Iterable], which iirc does not obey falsey eval when empty.

gravitas_deficiency@sh.itjust.works · edit-2 7 months ago

thing: Sequence[Any] iirc is iterable, indexable, and reversible.

thing: Iterable[Any] only guarantees that its iterable - and note that iterating can sometimes have the effect of consuming the iterable (e.g. when working with streaming interfaces)

BlackRoseAmongThorns@slrpnk.net · 7 months ago

Not really, generators have weird truthiness, i don’t remember if they evaluate to true or false, but they cannot be checked for emptiness so they default to either always true or always false.

gargolito@lemm.ee · 7 months ago

Python likes giving lists.

PattyMcB@lemmy.world · 7 months ago

I know I’m gonna get downvoted to oblivion for this, but… Serious question: why use Python if you’re concerned about performance?

lengau@midwest.social · 7 months ago

It’s all about trade-offs. Here are a few reasons why one might care about performance in their Python code:

Performance is often more tied to the code than to the interpreter - an O(n³) algorithm in blazing fast C won’t necessarily perform any better than an O(nlogn) algorithm in Python.
Just because this particular Python code isn’t particularly performance constrained doesn’t mean you’re okay with it taking twice as long.
Rewriting a large code base can be very expensive and error-prone. Converting small, very performance-sensitive parts of the code to a compiled language while keeping the bulk of the business logic in Python is often a much better value proposition.

These are also performance benefits one can get essentially for free with linter rules.

Anecdotally: in my final year of university I took a computational physics class. Many of my classmates wrote their simulations in C or C++. I would rotate between Matlab, Octave and Python. During one of our labs where we wrote particle simulations, I wrote and ran Octave and Python simulations in the time it took my classmates to write their C/C++ versions, and the two fastest simulations in the class were my Octave and Python ones, respectively. (The professor’s own sim came in third place). The overhead my classmates had dealing with poorly optimised code that caused constant cache misses was far greater than the interpreter overhead in my code (though at the time I don’t think I could have explained why their code was so slow compared to mine).

PattyMcB@lemmy.world · 7 months ago

I appreciate the large amount of info. Great answer. It just doesn’t make sense to me, all things being equal (including performant algorithms), why choose Python and then make a small performance tweak like in the article? I understand preferring the faster implementation, but it seems to me like waxing your car to reduce wind resistance to make it go faster, when installing a turbo-charger would be much more effective.

Teanut@lemmy.world · 7 months ago

If you use the profiler and see that the slower operation is being used frequently, and is taking up a chunk of time deemed significant, why not swap it to the faster version?

In a simulation I’m working on that goes through 42 million rounds I spent some time profiling and going through the code that was eating up a lot of time (especially things executed all 42 million times) and trying to find some optimizations. Brought the run time down from about 10 minutes to 5 minutes.

I certainly wasn’t going to start over in C++ or Rust, and if I’d started with either of those languages I would have missed out on a lot of really strong Python libraries and probably spent more time coding rather than refining the simulation.

lengau@midwest.social · 7 months ago

I think a better analogy would be that you’re tuning your bike for better performance because the trade-offs of switching to a car are worse than keeping the bike.

PattyMcB@lemmy.world · 7 months ago

Ok… I’ll buy that

uis@lemm.ee · edit-2 7 months ago

Performance is often more tied to the code than to the interpreter - an O(n³) algorithm in blazing fast C won’t necessarily perform any better than an O(nlogn) algorithm in Python.

An O(n³) algorithm in Python won’t necessarily perform any better than an O(nlogn) algorithm in C. Ever heard of galactic algorithms?

The overhead my classmates had dealing with poorly optimised code that caused constant cache misses was far greater than the interpreter overhead in my code (though at the time I don’t think I could have explained why their code was so slow compared to mine).

Did they write naive linear algebra operators?

Takapapatapaka@lemmy.world · 7 months ago

You may want to beneficiate from little performance boost even though you mostly don’t need it and still need python’s advantages. Being interested in performance isnt always looking for the very best performance there is out of any language, it can also be using little tips to go a tiny bit faster when you can.

Jerkface (any/all)@lemmy.ca · 7 months ago

Alternatively, why wait twice as long for your python code to execute as you have to?

Reptorian@lemmy.zip · 7 months ago

I have the same question. I prefer other languages. I use G’MIC for image processing over Python and C++.

Avicenna@lemmy.world · edit-2 7 months ago

Yea and then you use “not” with a variable name that does not make it obvious that it is a list and another person who reads the code thinks it is a bool. Hell a couple of months later you yourself wont even understand that it is a list. Moreover “not” will not throw an error if you don’t use an sequence/collection there as you should but len will.

You should not sacrifice code readability and safety for over optimization, this is phyton after all I don’t think list lengths will be your bottle neck.

Jerkface (any/all)@lemmy.ca · 7 months ago

Strongly disagree that not x implies to programmers that x is a bool.

taladar@sh.itjust.works · 7 months ago

It does if you are used to sane languages instead of the implicit conversion nonsense C and the “dynamic” languages are doing

Avicenna@lemmy.world · 7 months ago

well it does not imply directly per se since you can “not” many things but I feel like my first assumption would be it is used in a bool context

thebestaquaman@lemmy.world · 7 months ago

I would say it depends heavily on the language. In Python, it’s very common that different objects have some kind of Boolean interpretation, so assuming that an object is a bool because it is used in a Boolean context is a bit silly.

Avicenna@lemmy.world · edit-2 7 months ago

Well fair enough but I still like the fact that len makes the aim and the object more transparent on a quick look through the code which is what I am trying to get at. The supporting argument on bools wasn’t’t very to the point I agree.

That being said is there an application of “not” on other classes which cannot be replaced by some other more transparent operator (I confess I only know the bool and length context)? I would rather have transparently named operators rather than having to remember what “not” does on ten different types. I like duck typing as much as the next person, but when it is so opaque (name-wise) as in the case of “not”, I prefer alternatives.

For instance having open or read on different objects which does really read or open some data vs not some object god knows what it does I should memorise each case.

Jerkface (any/all)@lemmy.ca · edit-2 7 months ago

Truthiness is so fundamental, in most languages, all values have a truthiness, whether they are bool or not. Even in C, int x = value(); if (!x) x_is_not_zero(); is valid and idiomatic.

I appreciate the point that calling a method gives more context cues and potentially aids readability, but in this case I feel like not is the python idiom people expect and reads just fine.

Avicenna@lemmy.world · 7 months ago

I don’t know, it throws me off but perhaps because I always use len in this context. Is there any generally applicable practical reason why one would prefer “not” over len? Is it just compactness and being pythonic?

Jerkface (any/all)@lemmy.ca · edit-2 7 months ago

It’s very convenient not to have to remember a bunch of different means/methods for performing the same conceptual operation. You might call len(x) == 0 on a list, but next time it’s a dict. Time after that it’s a complex number. The next time it’s an instance. not works in all cases.

thebestaquaman@lemmy.world · 7 months ago

I definitely agree that len is the preferred choice for checking the emptiness of an object, for the reasons you mention. I’m just pointing out that assuming a variable is a bool because it’s used in a Boolean context is a bit silly, especially in Python or other languages where any object can have a truthiness value, and where this is commonly utilised.

Avicenna@lemmy.world · 7 months ago

It is not “assume” as in a conscious “this is probably a bool I will assume so” but more like a slip of attention by someone who is more used to the bool context of not. Is “not integer” or “not list” really that commonly used that it is even comparable to its usage in bool context?

Glitchvid@lemmy.world · 7 months ago

if not x then … end is very common in Lua for similar purposes, very rarely do you see hard nil comparisons or calls to typeof (last time I did was for a serializer).

catloaf@lemm.ee · 7 months ago

You can make that assumption at your own peril.

WhyJiffie@sh.itjust.works · 7 months ago

I don’t think they are a minority

Avicenna@lemmy.world · 7 months ago

If anything len tells you that it is a sequence or a collection, “not” does not tell you that. That I feel like is the main point of my objection.

Jerkface (any/all)@lemmy.ca · 7 months ago

deleted by creator

acosmichippo@lemmy.world · edit-2 7 months ago

i haven’t programmed since college 15 years ago and even i know that 0 == false for non bool variables. what kind of professional programmers wouldn’t know that?

acosmichippo@lemmy.world · 7 months ago

if you’re worried about readability you can leave a comment.

thebestaquaman@lemmy.world · 7 months ago

There is no guarantee that the comment is kept up to date with the code. “Self documenting code” is a meme, but clearly written code is pretty much always preferable to unclear code with a comment, largely because you can actually be sure that the code does what it says it does.

Note: You still need to comment your code kids.

Avicenna@lemmy.world · 7 months ago

If there is an alternative through which I can achieve the same intended effect and is a bit more safer (because it will verify that it has len implemented) I would prefer that to commenting. Also if I have to comment every len use of not that sounds quite redundant as len checks are very common

antlion@lemmy.dbzer0.com · 7 months ago

Could also compare against:

if not len(mylist)

That way this version isn’t evaluating two functions. The bool evaluation of an integer is false when zero, otherwise true.

FooBarrington@lemmy.world · 7 months ago

This is honestly the worst version regarding readability. Don’t rely on implicit coercion, people.

antlion@lemmy.dbzer0.com · 7 months ago

But the first example does the same thing for an empty list. I guess the lesson is that if you’re measuring the speed of arbitrary stylistic syntax choices, maybe Python isn’t the best language for you.

🌶️ - knighthawk@lemmy.ml · 7 months ago

so these are the only 2 ways then? huge if true

uis@lemm.ee · 7 months ago

There are decades of articles on c++ optimizations, that say “use empty() instead of size()”, which is same as here.

palordrolap@fedia.io · 7 months ago

As a Perl fossil I recognise this syntax as equivalent to if(not @myarray) which does the same thing. And here I was thinking Guido had deliberately aimed to avoid Perlisms in Python.

That said, the Perlism in question is the right* way to do it in Perl. The length operator does not do the expected thing on an array variable. (You get the length of the stringified length of the array. And a warning if those are enabled.)

* You can start a fight with modern Perl hackers with whether unless(@myarray) is better or just plain wrong, even if it works and is equivalent.

tiredofsametab@fedia.io · 7 months ago

I really liked unless in perl; especially as I get older !length or something makes that bang really easy to miss. I use !(length) or something instead to visually set it aside. unless made this much more visually clear.

Womble@lemmy.world · 7 months ago

Empty sequences being false goes back a lot further than perl, it was already a thing in the first lisp (in fact the empty list was the cannonical false).