Dead Simple Python: Iteration Power Tools

Jason C. McDonald - Mar 2 '19 - - Dev Community

Like the articles? Buy the book! Dead Simple Python by Jason C. McDonald is available from No Starch Press.


The previous section more or less ended on a cliffhanger.

When we last left our hero, we'd discovered loops and iterators in Python, and tasted a bit of the potential they offer.

I got a few choice words from my editors: "you left out zip() and enumerate(), and those are surely very important to any discussion on iterators!" Yes, they are, but the article was getting a bit long. Never fear, though - we're about to tackle them, and many more!

By the way, if you haven't read the previous section, "Loops and Iterators" you'll want to go back and do that now! Don't worry, I'll wait.

It may seem strange to dedicate an entire article to a handful of built-in functions, but these contribute a lot of magic to Python.

Revisiting range

Remember the range() function from the previous article? We briefly covered how it could be used to generate a sequence of numbers, but it has more power than it seems to at first glance.

Start and Stop

The first hurdle to using range() is understanding the arguments: range(start, stop). start is inclusive; we start on that actual number. stop, however, is exclusive, meaning we stop before we get there!

So, if we have range(1, 10), we get [1, 2, 3, 4, 5, 6, 7, 8, 9]. We start on 1, but we never actually get to 10; we stop one short.

If we wanted to include 10 in our sequence, we'd need range(1, 11): [1, 2, 3, 4, 5, 6, 7, 8, 9, 10].

By the way, if we only specify one argument, like range(10), it will assume the start of the range is 0. In this case, we'd get [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]. You'll see range() used in this manner quite often when it is used to control a traditional for loop.

Skipping Along

My favorite trick with range() is its optional third argument: when you specify range(start, stop, step), the step argument allows you to increment by values greater than 1.

One use might be to print out all the multiples of 7, from 7 itself to 700, inclusively: range(7, 701, 7) would do just that. (Take note, I specified 701 for the end, to ensure 700 would be included.)

Another use might be to print all odd numbers less than 100: range(1, 100, 2).

Storing Ranges

If you're trying ranges out, you'll probably notice that this doesn't do what you expect:

sevens = range(7, 701, 7)
print(sevens)
Enter fullscreen mode Exit fullscreen mode

The print command prints the literal phrase range(7, 701, 7). That's not what we wanted!

Remember, range() returns an object that is like an iterator (but isn't exactly). To store that as a list outright, we'd need to explicitly turn it into a list, by wrapping it in the list() function:

sevens = list(range(7, 701, 7))
print(sevens)
Enter fullscreen mode Exit fullscreen mode

Now that output is what we wanted - a list of the first hundred multiples of 7!

Slicing

Before we jump into all this new iteration goodness, I want to introduce you to extended indexing notation, which allows us to more powerfully select elements from an ordered container, such as a list.

Speaking of lists, let's put one together:

dossiers = ['Sir Vile', 'Dr. Belljar', 'Baron Grinnit', 'Medeva', 'General Mayhem', 'Buggs Zapper', 'Jacqueline Hyde', 'Jane Reaction', 'Dee Cryption']
Enter fullscreen mode Exit fullscreen mode

Whether you realize it or not, you already know normal index notation.

print(dossiers[1])
>>> Dr. Belljar
Enter fullscreen mode Exit fullscreen mode

That returned the second element (index 1) of the dossiers container. Simple enough, right? Virtually all languages offer this behavior.

So, what if we want the second and third elements?

print(dossiers[1:3])
>>> ['Dr. Belljar', 'Baron Grinnit']
Enter fullscreen mode Exit fullscreen mode

What just happened? In extended indexing notation, we have three arguments, separated by colons: start, stop, and step. Hey, sound familiar? It should - those are the same arguments that range() uses! They work exactly the same way, too. (Of course, we left off the third argument [step] in the example above.)

Take note, that example printed out Dr. Belljar (index 1) and Baron Grinnit (index 2), but not Medeva, because the stop argument is exclusive; we stop just short of it.

Do take note, start must be less than stop for you to get any results! There is an exception, though, which we'll talk about shortly.

Now, what if you wanted every other dossier, starting with the second one?

print(dossiers[1::2])
>>> ['Dr. Belljar', 'Medeva', 'Buggs Zapper', 'Jane Reaction']
Enter fullscreen mode Exit fullscreen mode

You'll notice that we didn't specify a stop. We actually didn't need to! Extended indexing notation allows you to leave out any argument, so long as you have the colons to separate everything. Since the second argument was omitted, we just put the extra : after where it would have been.

Going Backwards

Extended indexing notation takes the (start, stop, step) logic one step further, by allowing you to work BACKWARDS! This is a bit of a brain twister at first, though, so hang on tight...

print(dossiers[-1])
Enter fullscreen mode Exit fullscreen mode

That prints out the last item in the list. Negative numbers start counting from the end of the list! This can feel a little weird, since we're used to counting from 0 in indexes, but negative zero isn't really a thing, so we start with -1.

Given that, how do we print the last three items? We might try this, but it won't actually work....

print(dossiers[-1:-4])
>>> []
Enter fullscreen mode Exit fullscreen mode

That returns an empty list. Why? Remember, start must be less than stop, even when working with negative indices. So, we have to put -4 as our start, since -4 < -1.

print(dossiers[-4:-1])
>>> ['Buggs Zapper', 'Jacqueline Hyde', 'Jane Reaction']
Enter fullscreen mode Exit fullscreen mode

That's closer, but there's still a problem. Dee Cryption is our last item, so where is she? Remember, stop is exclusive; we stop just shy of it. But we can't just say dossiers[-4], since that'll only give us Buggs Zapper. And dossiers[-4:-0] isn't valid.

The way to solve this is to tell Python we are explicitly omitting the second argument: put a colon after our first argument!

print(dossiers[-4:])
>>> ['Buggs Zapper', 'Jacqueline Hyde', 'Jane Reaction', 'Dee Cryption']
Enter fullscreen mode Exit fullscreen mode

Great, now we see to the end, except now we have too much information. We want the last three, so let's change -4 to -3...

print(dossiers[-3:])
>>> ['Jacqueline Hyde', 'Jane Reaction', 'Dee Cryption']
Enter fullscreen mode Exit fullscreen mode

Thar she blows!

Speaking of magic, what do you suppose would happen if we put a negative number in the third argument, step? Let's try -1, with two colons preceding it, to indicate we want the whole list.

print(dossiers[::-1])
>>> ['Dee Cryption', 'Jane Reaction', 'Jacqueline Hyde', 'Buggs Zapper', 'General Mayhem', 'Medeva', 'Baron Grinnit', 'Dr. Belljar', 'Sir Vile']
Enter fullscreen mode Exit fullscreen mode

Hey, that prints everything backwards! Indeed, a step of -1 reverses the list.

Now let's try -2...

print(dossiers[::-2])
>>> ['Dee Cryption', 'Jacqueline Hyde', 'General Mayhem', 'Baron Grinnit', 'Sir Vile']
Enter fullscreen mode Exit fullscreen mode

Not only did that reverse the list, but it skipped every other element. A negative step behaves exactly like a positive step, except it works backwards!

So, what if we wanted to put everything together? Perhaps we want to list the second, third, and fourth elements in reverse order...

print(dossiers[2:5:-1])
>>> []
Enter fullscreen mode Exit fullscreen mode

Gotcha Alert: start and stop must be in the order of traversal. If step is positive, start must be less than stop; however, if step is negative, start must be greater than stop!

You can think of it like walking directions for a photo tour. step tells you which way to walk, and how big your stride should be. You start taking photos once you reach start, and as soon as you encounter stop, you put your camera away.

So, to fix that, we need to swap our start and stop.

print(dossiers[5:2:-1])
>>> ['Buggs Zapper', 'General Mayhem', 'Medeva']
Enter fullscreen mode Exit fullscreen mode

Side Note: Python also provides the slice() and itertools.islice() functions, which behave in much the same way. However, they're both more limited than the extended indexing notation, so you're almost always best off using that instead of the functions.

Playing With Iterables

The rest of the functions we'll be exploring in this section work with iterables. While I'll use lists for most examples, remember that you can use any iterable with these, including the range() function.

all and any

Imagine you got a whole bunch of data, such as a list of hundreds of names, in an iterable container, such as a list. Before you feed that list into your super brilliant algorithm, you want to save some processing time by checking that you actually have some string value in every single element, no exceptions.

This is what the all function is for.

dossiers = ['Sir Vile', 'Dr. Belljar', 'Baron Grinnit', 'Medeva', 'General Mayhem', 'Buggs Zapper', '', 'Jane Reaction', 'Dee Cryption']
print(all(dossiers))
>>> False
Enter fullscreen mode Exit fullscreen mode

You may recall, an empty string ('') evaluates to False in Python. The all() function evaluates each element, and ensures it returns True. If even one evaluates to False, the all() function will also return false.

any() works in almost the same way, except it only requires a single element to evaluate to True.

These may not seem terribly useful at first blush, but when combined with some of the other tools, or even with list comprehensions (later section), they can save a lot of time!

enumerate

Within a loop, if you need to access both the values of a list and their indices, you can do that with the enumerate() function.

foo = ['A', 'B', 'C', 'D', 'E']

for index, value in enumerate(foo):
    print(f'Element {index} is has the value {value}.')
Enter fullscreen mode Exit fullscreen mode

enumerate() isn't limited to lists, however. Like all these other functions, it works on any iterable, numbering (or enumerating) each of the values returned. For example, we can use it on range(). Let's use it to print out every multiple of 10 from 10 to 100 (range(10,101,10)). We'll enumerate that...

for index, value in enumerate(range(10,101,10)):
    print(f'Element {index} is has the value {value}.') 
Enter fullscreen mode Exit fullscreen mode

That gives us...

Element 0 is has the value 10.
Element 1 is has the value 20.
Element 2 is has the value 30.
Element 3 is has the value 40.
Element 4 is has the value 50.
Element 5 is has the value 60.
Element 6 is has the value 70.
Element 7 is has the value 80.
Element 8 is has the value 90.
Element 9 is has the value 100
Enter fullscreen mode Exit fullscreen mode

Hmm, rather interesting. We could make a neat pattern out of this, but we'd have to start the enumeration at 1, instead of 0. Sure enough, we can do that by passing the starting count as the second argument. We'll also tweak our message a bit, just to take advantage of the pattern to do something kinda neat.

for index, value in enumerate(range(10,101,10), 1):
    print(f'{index} times 10 equals {value}')
Enter fullscreen mode Exit fullscreen mode

When we run that, we get...

1 times 10 equals 10
2 times 10 equals 20
3 times 10 equals 30
4 times 10 equals 40
5 times 10 equals 50
6 times 10 equals 60
7 times 10 equals 70
8 times 10 equals 80
9 times 10 equals 90
10 times 10 equals 100
Enter fullscreen mode Exit fullscreen mode

filter

Let's imagine we're tracking the number of clues we find at a bunch of locations, perhaps storing them in a dictionary. I'll borrow and tweak a dictionary from the last section for this example...

locations = {
    'Parade Ground': 0,
    'Ste.-Catherine Street': 0,
    'Pont Victoria': 0,
    'Underground City': 3,
    'Mont Royal Park': 0,
    'Fine Arts Museum': 0,
    'Humor Hall of Fame': 2,
    'Lachine Canal': 4,
    'Montreal Jazz Festival': 1,
    'Olympic Stadium': 0,
    'St. Lawrence River': 2,
    'Old Montréal': 0,
    'McGill University': 0,
    'Chalet Lookout': 0,
    'Île Notre-Dame': 0
    }
Enter fullscreen mode Exit fullscreen mode

Perhaps we need to find all the locations that have clues, and ignore the rest. We'll start by writing a function to test a particular key-value tuple pair. This may seem like a ridiculous overcomplication, but it will make sense in a moment:

def has_clues(pair):
    return bool(pair[1])
Enter fullscreen mode Exit fullscreen mode

We'll be submitting each pair from the dictionary to the function as a tuple, so pair[1] will be the value (e.g. ('Underground City', 3)). The built-in function bool() will return False if the number is 0, and True for everything else, which is exactly what we want.

We use the filter() function to narrow down our dictionary, using that function we just wrote. Recall from the last section, we need to use locations.items() to get both the keys and values as pairs.

for place, clues in filter(has_clues, locations.items()):
    print(place)
Enter fullscreen mode Exit fullscreen mode

Take note, we don't include the parenthesis after has_clues. We are passing the actual function as an object! filter will do the actual calling.

Sure enough, running that code prints out the five places where we had clues (values > 0)...

Underground City
Humor Hall of Fame
Lachine Canal
Montreal Jazz Festival
St. Lawrence River
Enter fullscreen mode Exit fullscreen mode

Later in this series, we'll learn about lambdas, anonymous functions that will allow us to do away with the extra function altogether. As a preview, here's what that would actually look like...

for place, clues in filter(lambda x:bool(x[1]), locations.items()):
    print(place)
Enter fullscreen mode Exit fullscreen mode

map

map() functions in a similar way to filter(), except instead of using the function to omit elements from the iterable, it is used to change them.

Let's imagine we have a list of temperatures in Fahrenheit:

temps = [67.0, 72.5, 71.3, 78.4, 62.1, 80.6]
Enter fullscreen mode Exit fullscreen mode

We want to convert those all to Celsius, so we write a function for that.

def f_to_c(temp):
    return round((temp - 32) / 1.8, 1)
Enter fullscreen mode Exit fullscreen mode

We can use the map() function to apply that to each value in temps, producing an iterator we can use in a loop (or anywhere).

for c in map(f_to_c, temps):
    print(f'{c}°C')
Enter fullscreen mode Exit fullscreen mode

Remember, we're passing the function object f_to_c as the first argument of map(), so we leave the parenthesis off!

Running that loop gives us:

19.4°C
22.5°C
21.8°C
25.8°C
16.7°C
27.0°C
Enter fullscreen mode Exit fullscreen mode

min and max

Let's keep working with those temperatures for a moment. If we wanted to find the lowest or the highest in the list, we could use the min() or max() functions, respectively. Not much to this, really.

temps = [67.0, 72.5, 71.3, 78.4, 62.1, 80.6]
print(min(temps))
>>> 62.1
print(max(temps))
>>> 80.6
Enter fullscreen mode Exit fullscreen mode

Side Note: Unrelated to iterables, you can also use those functions to find the smallest or largest of a list of arguments you give it, such as min(4, 5, 6, 7, 8), which would return 4.

sorted

Often, you'll want to sort an iterable. Python does this very efficiently through the sorted() built-in function.

temps = [67.0, 72.5, 71.3, 78.4, 62.1, 80.6]
for t in sorted(temps):
    print(t)
Enter fullscreen mode Exit fullscreen mode

That produces...

62.1
67.0
71.3
72.5
78.4
80.6
Enter fullscreen mode Exit fullscreen mode

reversed

Most of the time, the extended indexing notation [::-1] will allow you to reverse a list or other ordered iterable. But if that's not an option, you can also use the reversed() function.

For example, I'll combine it with the sorted() function from a moment ago...

temps = [67.0, 72.5, 71.3, 78.4, 62.1, 80.6]
for t in reversed(sorted(temps)):
    print(t)
Enter fullscreen mode Exit fullscreen mode

That gives us...

80.6
78.4
72.5
71.3
67.0
62.1
Enter fullscreen mode Exit fullscreen mode

sum

Another quick built-in function is sum(), which adds all of the elements in the iterable together. Naturally, this only works if all the elements can be added together.

One use of this would be in finding an average of those temperatures earlier. You may recall that the len() function tells us how many elements are in a container.

temps = [67.0, 72.5, 71.3, 78.4, 62.1, 80.6]
average = sum(temps) / len(temps)
print(round(average, 2))
>>> 71.98 
Enter fullscreen mode Exit fullscreen mode

zip

Remember that earlier example about the locations and clues? Imagine we got that information, not in a dictionary, but in two lists:

locations = ['Parade Ground', 'Ste.-Catherine Street', 'Pont Victoria', 'Underground City', 'Mont Royal Park', 'Fine Arts Museum', 'Humor Hall of Fame', 'Lachine Canal', 'Montreal Jazz Festival', 'Olympic Stadium', 'St. Lawrence River', 'Old Montréal', 'McGill University', 'Chalet Lookout', 'Île Notre-Dame']
clues = [0, 0, 0, 3, 0, 0, 2, 4, 1, 0, 2, 0, 0, 0, 0]
Enter fullscreen mode Exit fullscreen mode

Yuck! That's not fun to work with, although there are certainly real world scenarios where we would get data in this fashion.

Thankfully, the zip() function can help us make sense of this data by aggregating it into tuples using an iterator, giving us (locations[0], clues[0]), (locations[1], clues[1]), (locations[2], clues[2]) and so on.

The zip() function isn't even limited to two iterables; it can zip together as many as we give it! If the iterables don't all have the same length, any "extras" will hang out at the end.

Of course, in this case, the two lists are the same length, so the results are rather obvious. Let's create a new list using the data from zip, and print it out.

data = list(zip(locations, clues))
print(data)
Enter fullscreen mode Exit fullscreen mode

That gives us a structure not unlike what we got from the dictionary's .items() function earlier!

[('Parade Ground', 0), ('Ste.-Catherine Street', 0), ('Pont Victoria', 0), ('Underground City', 3), ('Mont Royal Park', 0), ('Fine Arts Museum', 0), ('Humor Hall of Fame', 2), ('Lachine Canal', 4), ('Montreal Jazz Festival', 1), ('Olympic Stadium', 0), ('St. Lawrence River', 2), ('Old Montréal', 0), ('McGill University', 0), ('Chalet Lookout', 0), ('Île Notre-Dame', 0)]
Enter fullscreen mode Exit fullscreen mode

In fact, if I recall my filter() function with the lambda, I can tweak it to use zip, letting us work purely from the two lists:

for place, clues in filter(lambda x:bool(x[1]), zip(locations, clues)):
    print(place)
Enter fullscreen mode Exit fullscreen mode

As before, that outputs...

Underground City
Humor Hall of Fame
Lachine Canal
Montreal Jazz Festival
St. Lawrence River
Enter fullscreen mode Exit fullscreen mode

itertools

I've covered virtually all of Python's built-in functions for working with iterables, but there are still many more to be had in the itertools module. I strongly recommend reading the documentation to learn more.

Review

This section has been a bit more encyclopedic in nature than the rest of the series, but I hope it's given you an appreciation for some of the incredible things you can do with your new iterator skills.

If you're still waiting on those long-promised generators and list comprehensions, never fear! They're coming up in the very next sections.

As always, I recommend that you read the documentation:

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .