Dead Simple Python: List Comprehensions and Generator Expressions

Jason C. McDonald - Mar 6 '19 - - Dev Community

Like the articles? Buy the book! Dead Simple Python by Jason C. McDonald is available from No Starch Press.


If I had to pick a favorite feature of Python, it would have to be list comprehensions, hands down. To my mind, they encapsulate the very essence of "Pythonic" code...which is ironic, since they're actually borrowed from Haskell.

I was so excited to get to them; this is the article I planned on writing a few weeks back, but I realized that understanding iterators would be essential to really grokking list comprehensions and their potential.

If you haven't read the previous two articles on Loops and Iterators and Iterator Power Tools, you'll want to go back and do that now.

List Comprehensions vs. Generator Expressions

Generator expressions are a concise way to generate containers. Most commonly, you'll hear of list comprehensions, but set comprehensions and dict comprehensions also exist. The difference in terms is somewhat important, however: it's only a list comprehension if you're actually making a list.

A generator expression is surrounded by parenthesis ( ), while a list comprehension is surrounded by square brackets [ ]. Set comprehensions are enclosed in curly braces { }. Aside from those differences, the syntax is identical in all three cases!

(There's a bit more to a dict comprehension, which we'll talk about later.)

Generator expressions have everything to do with generators, which we'll be exploring in depth in a later section. For now, we'll suffice with the official definition of a generator expression:

generator expression - An expression that returns an iterator.

Structure of a Generator Expression

A generator expression (or list/set comprehension) is a little like a for loop that has been flipped around.

For a simple example, let's recall an example from the last article, where we were converting a list of Fahrenheit temperatures to Celsius. I'll tweak it slightly, so the numbers will be stored in another list instead of printed directly.

temps_f = [67.0, 72.5, 71.3, 78.4, 62.1, 80.6]
temps_c = []

def f_to_c(temp):
    return round((temp - 32) / 1.8, 1)

for c in map(f_to_c, temps_f):
    temps_c.append(c)

print(temps_c)
Enter fullscreen mode Exit fullscreen mode

Believe it or not, a list comprehension will allow me to reduce that entire program to three lines! We'll simplify it down one piece at a time, so you can see what I mean.

Let's start by replacing the for loop with a list comprehension...


temps_f = [67.0, 72.5, 71.3, 78.4, 62.1, 80.6]

def f_to_c(temp):
    return round((temp - 32) / 1.8, 1)

temps_c = [f_to_c(temp) for temp in temps_f]

print(temps_c)
Enter fullscreen mode Exit fullscreen mode

The important line is temps_c = [f_to_c(temp) for temp in temps_f]. This behaves very much like map() does. For each element temp in the list temps_f, we apply the function f_to_c().

Now, if I were to need that f_to_c() function elsewhere, I'd actually stop here and call it good. However, if this was the only place where I needed the Fahrenheit-to-Celsius logic, I could eschew the function altogether, and move the logic directly into the comprehension:

temps_f = [67.0, 72.5, 71.3, 78.4, 62.1, 80.6]
temps_c = [round((temp-32) / 1.8, 1) for temp in temps_f]
print(temps_c)
Enter fullscreen mode Exit fullscreen mode

What did I tell you? Three lines!

Depending on where I got the data from, I might even be able to reduce further. Let's see this with another example.

Imagine you have a program that receives a bunch of integers on a single line, separated by spaces, such as 5 4 1 9 5 7 5. You want to find the sum of all those integers. (For the sake of simplicity, assume you have no risk of bad input.)

Let's start by writing this the obvious way, without a list comprehension.

user_input = input()
values = user_input.split(' ')

total = 0

for v in values:
    n = int(v)
    total += n

print(total)
Enter fullscreen mode Exit fullscreen mode

Fairly obvious, right? We get the user input as a string, and then split that string on the spaces to get the individual numbers. We create a variable for storing our total, and then use a loop to iterate over each value, convert it to an integer, and add it to the total. Now that we have working logic, let's simplify and optimize it.

Let's start by simplifying a few obvious things here. We've covered all these concepts before, so see if you can spot what I've improved.

values = input().split(' ')
total = 0

for v in values:
    total += int(v)

print(total)
Enter fullscreen mode Exit fullscreen mode

We can't get much simpler than this unless we employ a list comprehension, so let's do that now!

values = input().split(' ')
total = sum(int(v) for v in values)
print(total)
Enter fullscreen mode Exit fullscreen mode

The generator expression here is (int(v) for v in values). For every value v in the list values, we cast it to an integer (int(v)).

Notice how I used the sum() function, passing the generator expression right to it. Since the expression got passed directly as the only argument, I didn't need an extra pair of parenthesis around it.

Now, if I didn't need the values list for anything else, I could actually move that logic right into the generator expression!

total = sum(int(v) for v in input().split(' '))
print(total)
Enter fullscreen mode Exit fullscreen mode

Easy as pie, right?

Nested List Comprehensions

What if instead, we wanted the sum of the squares of every number entered? There are, in fact, two ways to do this. The obvious option is to do this:

total = sum(int(v)**int(v) for v in input().split(' '))
print(total)
Enter fullscreen mode Exit fullscreen mode

That works, but somehow it just feels wrong, doesn't it? We're casting v to an integer twice.

We can get around this by nesting a list comprehension into our generator expression!

total = sum(n**2 for n in [int(v) for v in input().split(' ')])
Enter fullscreen mode Exit fullscreen mode

List comprehensions and generator expressions are evaluated inner to outer. The innermost expression, int(v) for v in input().split(' '), is run first, and the enclosing square brackets [ ] convert that to a list (an iterable).

Next, the outer expression, n**2 for n in [LIST] is run, where [LIST] is that list we generated a moment ago.

It can be easy for this nesting to get away from you. I try to use it sparingly. When I need nesting, I write each list comprehension on a separate line and store it...

the_list = [int(v) for v in input().split(' ')]
total = sum(n**2 for n in the_list)
print(total)
Enter fullscreen mode Exit fullscreen mode

...test it out, and then start nesting via copy and paste.

Conditions in Generator Expressions

Let's make that example a bit tricker. What if we wanted the sum of only the odd numbers in the list? Generator expressions and list comprehensions, awesomely enough, can do that too.

Ultimately, we'll be using nesting in this example, but we'll start with the non-nested version first, to make the new logic easier to see.

the_list = [int(v) for v in input().split(' ')]
total = sum(n**2 for n in the_list if n%2==0)
print(total)
Enter fullscreen mode Exit fullscreen mode

The new part is on the second line. At the end of the generator expression, I added if n%2==0. You may recognize the modulo operator (%), which gives us the remainder of division. Any even number is divisible by 2, meaning it will have no remainder. Thus, n%2==0 is only true for even numbers.

It can feel a little weird, putting the conditional AFTER a statement, instead of before. The easiest way to understand it is to think about how the same code would look without the generator expression...

output = []
for n in the_list:
    if n%2==0:
        output.append(n**2)
Enter fullscreen mode Exit fullscreen mode

Basically, to turn that into generator expression, you simply grab the logic within append(), park it out in front...

n**2
for n in the_list:
    if n%2==0:
Enter fullscreen mode Exit fullscreen mode

...and then remove the colons (:), line breaks, and indentation from the for and if statements...

n**2 for n in the_list if n%2==0
Enter fullscreen mode Exit fullscreen mode

Multiple Iterables

We can also use generator expressions and list comprehensions to loop through multiple iterables at once, in the same manner as a nested loop.

Consider the following logic:

num_a = [1, 2, 3, 4, 5]
num_b = [6, 7, 8, 9, 10]
output = []

for a in num_a:
    for b in num_b:
        output.append(a*b)
Enter fullscreen mode Exit fullscreen mode

We can follow those same steps I gave a moment ago to turn that into a list comprehension as well! We bring the argument for append() out in front...

a*b
for a in num_a:
    for b in num_b:
Enter fullscreen mode Exit fullscreen mode

...and then we collapse the rest down onto one line, removing the colons.

a*b for a in num_a for b in num_b
Enter fullscreen mode Exit fullscreen mode

Finally, wrap it in square brackets, and assign it to output.

output = [a*b for a in num_a for b in num_b]
Enter fullscreen mode Exit fullscreen mode

Set Comprehensions

As I mentioned at the start of the article, just as you can create a list using a generator expression wrapped in square brackets [ ], you can also create a set by using curly braces { } instead.

For example, let's generate a set of all the remainders you can get from dividing 100 by an odd number less than 100. By using a set, we're ensuring we have no duplicates, making the results easier to comprehend.

odd_remainders = {100%n for n in range(1,100,2)}
print(odd_remainders)
Enter fullscreen mode Exit fullscreen mode

Running that code gives us...

{0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 25, 26, 27, 29, 30, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49}
Enter fullscreen mode Exit fullscreen mode

There really aren't any surprises here. Set comprehensions work the same as list comprehensions, except for which container is created.

Dict Comprehensions

A dictionary comprehension follows almost the same structure as the other forms of generator expressions, but for one difference: colons.

If you recall, when you create a set or a dictionary, you use curly braces { }. The only difference is, in a dictionary, you use the colon : to separate key-value pairs, something you wouldn't do in a set. The same principle applies here.

For example, if we wanted to create a dictionary that stores an integer between 1 and 100 as the key, and the square of that number as the value...

squares = {n : n**2 for n in range(1,101)}
print(squares)
Enter fullscreen mode Exit fullscreen mode

That's all there is to it! Again, besides the colon :, everything else is the same as any other generator expression.

Hazards

It might be deeply tempting to use list comprehensions or generator expressions for absolutely everything. They're rather addictive, partly because one feels really smart when crafting one. There's something about powerful one-liners that gets programmers very excited - we really like being clever with our code.

However, I must caution you against going too crazy. Remember the Zen of Python! Here's the part that's relevant to this topic...

Beautiful is better than ugly.
...
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
...

List comprehensions can be beautiful, but they can also become dense, highly toxic pieces of logic when used unwisely.

1. They Get Unreadable Fast

I borrowed this example from a survey by OpenEDX

primary = [ c for m in status['members'] if m['stateStr'] == 'PRIMARY' for c in rs_config['members'] if m['name'] == c['host'] ]
secondary = [ c for m in status['members'] if m['stateStr'] == 'SECONDARY' for c in rs_config['members'] if m['name'] == c['host'] ]
hidden = [ m for m in rs_config['members'] if m['hidden'] ]
Enter fullscreen mode Exit fullscreen mode

Can you tell what's going on? You probably could if you read it for a while, but why would you want to? The code is as clear as mud. (In the survey, this was rated as the most unreadable example.) Sure, you could add comments to explain what's happening - they did in the original example, in fact - but any time you need a comment to explain what the code is doing, it's almost certainly too complicated.

List comprehensions and generator expressions are powerful, but it doesn't take much for them to become unreadable like this.

This doesn't necessarily have to be the case. You can regain a lot of readability simply by splitting up your list comprehension across multiple lines, in a similar structure to a traditional loop.

primary = [
    c
    for m in status['members']
        if m['stateStr'] == 'PRIMARY'
    for c in rs_config['members']
        if m['name'] == c['host']
    ]

secondary = [
    c
    for m in status['members']
        if m['stateStr'] == 'SECONDARY'
    for c in rs_config['members']
        if m['name'] == c['host']
    ]

hidden = [
    m
    for m in rs_config['members']
        if m['hidden']
    ]
Enter fullscreen mode Exit fullscreen mode

Mind you, that doesn't totally justify the above. I'd still use traditional loops instead of the example shown, simply because they'd be easier to read and maintain.

Still not convinced that this is an easily abused feature of Python? My IRC pal grym was kind enough to share this real world example he encountered. We have no idea what it does.

cropids = [self.roidb[inds[i]]['chip_order'][
               self.crop_idx[inds[i]] % len(self.roidb[inds[i]]['chip_order'])] for i in
           range(cur_from, cur_to)]
Enter fullscreen mode Exit fullscreen mode

My soul burns just looking at that.

2. They Don't Replace Loops

grym pointed out the following scenario. (This code is fictitious, FYI.)

some_list = getTheDataFromWhereever()
[API.download().process(foo) for foo in some_list]
Enter fullscreen mode Exit fullscreen mode

That looks innocuous enough to the untrained eye, but note what's happening...the data in some_list is being modified (mutated) directly, but the result isn't being stored. This is a case of the list comprehension, or even the generator expression, being abused to take the place of a loop. It makes for some difficult reading, not to mention debugging.

No matter how clever you want to be with generator expressions, this is one case where you should stick to loops:

some_list = getTheDataFromWhereever()
for foo in some_list:
    API.download().process(foo)
Enter fullscreen mode Exit fullscreen mode

3. They Can Be Hard to Debug

Think about the nature of a list comprehension: you're packing everything into one gigantic statement. The benefit to this is that you eliminate a bunch of intermediate steps. The drawback is that...you eliminate a bunch of intermediate steps.

Think about debugging a typical loop. You can step through it, one iteration at a time, using your debugger to observe the state of each variable as you go. You can also use error handling to deal with unusual edge cases.

By contrast, none of that works in a generator expression or list comprehension. Everything either works, or it doesn't! You can try to parse through the errors and output to figure out what you did wrong, but I can assure you, it's a confusing experience.

You can avoid some of this madness by avoiding list comprehensions on your first version of the code! Write the logic the obvious way, using traditional loops and iterator tools. Once you know it's working, then and only then should you collapse the logic into a generator expression, and only if you can do so without eschewing error handling.

This may sound like a lot of extra work, but I follow this exact pattern in competitive code golfing. My understanding of generator expressions is usually my main advantage against less experienced competitors, but I always write the standard loop first: I cannot afford to waste time debugging bad logic in a generator expression.

Review

List comprehensions, generator expressions, set comprehensions, and dictionary comprehensions are an exciting feature of Python. They allow you to write very powerful, compact code.

They are not without their limits and drawbacks, however. You should carefully weigh your options before using a generator expression. Even if you decide to use one, it is safest to write your logic using standard loops and iterators first, and then rewriting it as a generator expression.

Let's review the key points...

  • A generator expression follows the structure <expression> for <name> in <iterable> if <condition>. Optionally, the if section can be left off. Multiple for...in and if sections can be used in one generator expression.
  • You can change a standard loop and conditional block to a generator expression by moving the innermost code to the front, and removing the colons after the remaining statements, typically moving them all onto one line.
  • Nested generator expressions and list comprehensions are permitted.
  • A list comprehension produces a list. It is a generator expression wrapped in square brackets [ ].
  • A set comprehension produces a set. It is a generator expression wrapped in curly braces { }.
  • A dict comprehension produces a dict. It is a generator expression wrapped in curly braces { }, with the key-value pair in the expression, separated by a colon :.
  • Generator expressions in any form aren't intended as outright replacements to standard loops. Use wisdom in applying them, especially as they can be hard to read, understand, or debug.

Generator expressions are just the tip of the iceberg that is generators. We'll be exploring that topic in the next section.

As usual, I strongly recommend that you read the documentation:


Thank you to altendky, grym, and nedbat (Freenode IRC #python) for suggested revisions and inclusions.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .