Coding Pattern: Check at Least One
Today we’ll discuss a somewhat technical issue that I bumped into a few days ago: what is the optimal way to check if at least one element in a list satisfies some condition?
Let’s take a real example. We will use python, but in general the problem can be solved similarly in many other languages:
number_list = [2, 5, 8, 3, 6, 20, 34, 68]
def larger_than_twenty(number):
return number > 20
How should we check if at least one element of the number_list
satisfies the larger_than_twenty
condition (i.e. larger_than_twenty(element) == True
)?
Option 1: Naive options
The most straight-forward way we can think of is to apply the function on every element:
number_list_checked = []
for number in number_list:
number_list_checked.append(larger_than_twenty(number))
or, a cleaner way using list comprehension. Note that this method is particular for Python.
number_list_checked = [larger_than_twenty(number) for number in number_list]
Since True
and False
are normally indicated as 1
and 0
, we can generally check if there’s at least one True
element by checking the sum of the list:
if sum(number_list_checked) > 0:
print("At least one element of the array is larger than 20")
else:
print("No element of the array is larger than 20")
Using list comprehension, we can write the whole thing as:
def at_least_one_larger_than_twenty(number_list):
return sum([larger_than_twenty(number) for number in number_list]) > 0
The solution now looks short and (perhaps) elegant enough for such a “simple” problem, but maybe not optimal. To understand why, let’s put some output into our larger_than_twenty()
function:
def larger_than_twenty(number):
print(f"Running larger_than_twenty({number})")
return number > 20
Try running the code, we get:
Running larger_than_twenty(2)
Running larger_than_twenty(5)
Running larger_than_twenty(8)
Running larger_than_twenty(3)
Running larger_than_twenty(6)
Running larger_than_twenty(20)
Running larger_than_twenty(34)
Running larger_than_twenty(68)
At least one element of the array is larger than 20
Now we can see why this solution might not be the optiomal one: After evaluating element 34
, it should be enough to conclude that there’s at least one element larger than 20, but element 68 was still accessed. This happens because we created a whole new list by applying the larger_than_twenty()
on every element of number_list
before evaluating the list as a whole. Imagine how much time we’d waste if the array is 1 billion element long, and the first element is larger than 20.
Option 2: Evaluating element by element
As we figured that the check should end as soon as it finds one element that satisfies the condition (larger than 20), we should not apply the condition onto every element before evaluating them. Instead, the condition should be evaluated as soon as one element is checked, something like this:
def at_least_one_larger_than_twenty(number_list):
for number in number_list:
if larger_than_twenty(number):
return True
return False
When re-running the code, we can see that it no longer accesses element 68
Running larger_than_twenty(2)
Running larger_than_twenty(5)
Running larger_than_twenty(8)
Running larger_than_twenty(3)
Running larger_than_twenty(6)
Running larger_than_twenty(20)
Running larger_than_twenty(34)
At least one element of the array is larger than 20
To illustrate how faster this solution to the previous one, let’s modify our number_list
a bit. Now the first element will be 21
instead of 2
(so it satisfies the larger_than_twenty()
condition, and the length of the list will be 100000 (one hundred thousand) times longer:
number_list = [21, 5, 8, 3, 6, 20, 34, 68]*100000
This is how it looks like on my computer (using time command):
# SOLUTION 1
python3 main.py 1,30s user 1,10s system 99% cpu 2,397 total
# SOLUTION 2
python3 main.py 0,05s user 0,01s system 98% cpu 0,064 total
Wait, what about my one-liner?
Using list comprehension is a good choice in most of the times, but for situation like this, where using break
or return
in the middle of the loop is necessary (or better), list comprehension doesn’t help. One may argue that sometimes clean code should be prioritied from speed (and I’m not disagreeing with that statement), but when the speed improvement becomes visible, adding one or two lines to your code wouldn’t hurt. Besides, everyone might have a slightly different idea on how “clean code” looks like, shorter code may not always be the cleaner one.
With that said, at least in Python, I found a way that may help saving those couple of lines. There’s a syntax called any
, whose purpose is to receive an iterable as parameter and return True
as soon as it accesses a True
element, and False
otherwise. From the Python Documentation, any(iterable)
is defined to be equivalent to:
def any(iterable):
for element in iterable:
if element:
return True
return False
So does it mean we can use any
with list-comprehension? Let’s figure out:
def at_least_one_larger_than_twenty(number_list):
return any([larger_than_twenty(number) for number in number_list])
If you edit your code like above and try running it, you will see that every element of the list will still be accessed. That’s not a problem of any()
, but the fact that we did use list comprehension. When we feed [larger_than_twenty(number) for number in number_list]
to any()
, what the function receives is the list of True
and False
, composed by applying larger_than_twenty()
onto every element of the number_list
. In other words, what we feed to any()
is the same as what we did to sum()
in the previous part, so it has the same bottle neck.
It should be noted that
any()
should be still faster thansum()
, since any will exit as soon as it sees a satisfies element, whilesum()
will loop through the whole list once again to compute the sum.
Luckily, we have a better way to use any()
. Turned out, a list is an iterable, but it doesn’t mean any()
only accepts lists. The following code might look kinda odd, but very powerful:
def at_least_one_larger_than_twenty(number_list):
return any(larger_than_twenty(number) for number in number_list)
Here is the result (using the same sample of 800,000 numbers we used earlier)
python3 main.py 0,03s user 0,01s system 98% cpu 0,038 total
It’s more or less the same result we got with the fast-but-ugly code of iterating through number_list
.
What we have done here is to create an iterable using the syntax larger_than_twenty(number) for number in number_list
. Normally, in the so-called list comprehension, we capture this iterable inside a []
pair, and we get a list. However, what many of us don’t know (me included, until I accidentally figured it out recently) is that list comprehension isn’t applied only to lists. Kinda mind-blowing, huh?
I hope this article was useful for you guys. See ya!