Python Data Structures Idioms

Ivan Mushketyk - Sep 29 '17 - - Dev Community

Significant portion of our time we as a developers spend writing code that manipulates basic data structures: traverse a list, create a map, filter elements in a collection. Therefore it is important to know how effectively do it in Python and make your code more readable and efficient.

Using lists

Iterate over a list

There are many ways to iterate over a list in Python. And the simplest way would be just to maintain current position in list and increment it on each iteration:

## SO WRONG
l = [1, 2, 3, 4, 5]
i = 0
while i < len(l):
    print l[i]
    i += 1

Enter fullscreen mode Exit fullscreen mode

This works, but Python provides a more convenient way to do using range function. range function can be used to generate numbers from 0 to N and this can be used as an analog of a for loop in C:

## STILL WRONG
for i in range(len(l)):
    print l[i]

Enter fullscreen mode Exit fullscreen mode

While this is more concise, there is a better way to do it since Python let us iterate over a list directly, similarly to foreach loops in other languages:

# RIGHT
for v in l:
    print v

Enter fullscreen mode Exit fullscreen mode

Iterate over a list in reverse order

How can we iterate a list in the reverse order? One way to do it would be to use an unreadable 3 arguments version of the range function and provide position of the last element in a list (first argument), position of an element before the first element in the list (second argument) and negative step to go in reverse order (third argument):

# WRONG
for i in range(len(l) - 1, -1, -1):
    print l[i]

Enter fullscreen mode Exit fullscreen mode

But as you've may already guessed Python should offer a much better way to do it. We can just use reversed function in a for loop:

# RIGHT
for i in reversed(l):
    print i

Enter fullscreen mode Exit fullscreen mode

Access the last element

A commonly used idiom to access the last element in a list would be: get length of a list, subtract 1 from it, use result number as a position of the last element:

# WRONG
l = [1, 2, 3, 4, 5]
>>> l[len(l) - 1]
5

Enter fullscreen mode Exit fullscreen mode

This is cumbersome in Python since it supports negative indexes to access elements from the end of the list. So -1 is the last element:

# RIGHT
>>> l[-1]
5
Enter fullscreen mode Exit fullscreen mode

Negative indexes can also be used to access a next to last element and so on:

# RIGHT
>>> l[-2]
4
>>> l[-3]
3
Enter fullscreen mode Exit fullscreen mode

Use sequence unpacking

A common way to extract values from a list to multiple variables in other programming languages would be to use indexes:

# WRONG
l1 = l[0]
l2 = l[1]
l3 = l[2]
Enter fullscreen mode Exit fullscreen mode

But Python supports sequence unpacking that lets us to extract values from a list to multiple variables:

# RIGHT
l1, l2, l3 = [1, 2, 3]

>>> l1
1
>>> l2
2
>>> l3
3
Enter fullscreen mode Exit fullscreen mode

Use lists comprehensions

Let's say we want to filter all grades for a movie posted by users of age 18 or bellow.

How many times did you write code like this:

# WRONG
under_18_grades = []
for grade in grades:
    if grade.age <= 18:
        under_18_grades.append(grade)

Enter fullscreen mode Exit fullscreen mode

Do it no more in Python and use list comprehensions with if statement instead.

# RIGHT
under_18_grades = [grade for grade in grades if grade.age <= 18]
Enter fullscreen mode Exit fullscreen mode

Use enumerate function

Sometimes you need to iterate over a list and keep track of a position of each element. Say, if you need to display a menu items in a shell you can simply use the range function:

# WRONG
for i in range(len(menu_items)):
    menu_items = menu_items[i]
    print "{}. {}".format(i, menu_items)
Enter fullscreen mode Exit fullscreen mode

A better way to do it would be to use enumerate function. It is a iterator that returns pairs each of which contains position of an element and the element itself:

# RIGHT
for i, menu_items in enumerate(menu_items):
    print "{}. {}".format(i, menu_items)
Enter fullscreen mode Exit fullscreen mode

Use keys to sort

A typical way to sort elements in other programming languages is to provide a function that compares two objects along with a collection to sort. In Python it would look like:

people = [Person('John', 30), Person('Peter', 28), Person('Joe', 42)]

# WRONG
def compare_people(p1, p2):
    if p1.age < p2.age:
        return -1
    if p1.age > p2.age:
        return 1
    return 0

sorted(people, cmp=compare_people)

[Person(name='Peter', age=28), Person(name='John', age=30), Person(name='Joe', age=42)]
Enter fullscreen mode Exit fullscreen mode

But this is not the best way to do it. Since all we need to do to compare two instances of Person class is to compare values of their age field. Why should we write a complex compare function for this?

Specifically for this case sorted function accepts key function that is used to extract a key that will be used to compare two instances of an object:

# RIGHT
sorted(people, key=lambda p: p.age)
[Person(name='Peter', age=28), Person(name='John', age=30), Person(name='Joe', age=42)]
Enter fullscreen mode Exit fullscreen mode

Use all/any functions

If you want to check if all or any value in a collection is True one way would be iterate over a list:

# WRONG
def all_true(lst):
    for v in lst:
        if not v:
            return False
    return True
Enter fullscreen mode Exit fullscreen mode

But Python already has all, any functions for that. all returns True if all values in an iterable passed to it are True, while any returns True if at least one of values passed to it is True:

# RIGHT
all([True, False])
>> False

any([True, False])
>> True
Enter fullscreen mode Exit fullscreen mode

To check if all items comply with a certain condition, you can convert a list of arbitrary objects to a list of booleans using list comprehension:

all([person.age > 18 for person in people])
Enter fullscreen mode Exit fullscreen mode

Or you can pass a generator (just omit square braces around the list comprehension):

all(person.age > 18 for person in people)
Enter fullscreen mode Exit fullscreen mode

Not only this will save you two keystrokes it will also omit creation of an intermediate list (more about this later).

Use slicing

You can take part of a list using a technique called slicing. Instead of providing a single index in a square brackets when accessing a list you can provide the following three values

lst[start:end:step]
Enter fullscreen mode Exit fullscreen mode

All of these parameters are optional and you can get different parts of a list if you omit some of them. If only start position is provided it will return all elements in a list starting from the specified index:

# RIGHT
>>> lst = range(10)
>>> lst[3:]
[3, 4, 5, 6, 7, 8, 9]
Enter fullscreen mode Exit fullscreen mode

If only end position is provided slicing will return all elements up to the provided position:

>>> lst[:-3]
[0, 1, 2, 3, 4, 5, 6]
Enter fullscreen mode Exit fullscreen mode

You can also get part of a list between two indexes:

>>> lst[3:6]
[3, 4, 5]
Enter fullscreen mode Exit fullscreen mode

By default step in slicing is equal to one which mean that all elements between start and end positions are returned. If you want to get only every second element or every third element you need to provide a step value:

>>> lst[2:8:2]
[2, 4, 6]
Enter fullscreen mode Exit fullscreen mode

Do not create unnecessary objects

Use xrange

range is a useful function if you need to generate consistent integer values in a range, but it has one drawback: it returns a list with all generated values:

# WRONG
# Returns a too big list
for i in range(1000000000):
    ...
Enter fullscreen mode Exit fullscreen mode

Solution here is to use xrange function. It immediately return an iterator instead of creating a list:

# RIGHT
# Returns an iterator
for i in xrange(1000000000):
    ...
Enter fullscreen mode Exit fullscreen mode

The drawback of xrange comparing to the range function is that it's output can be iterated only once.

New in Python 3

In Python 3 xrange was removed and range function behaves like xrange in Python 2.x. If you need to iterate over an output of range in Python 3 multiple times you can convert its output in to a list:

>>> list(range(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Enter fullscreen mode Exit fullscreen mode

Use izip

If you need to generate pairs from elements in two collections, one way to do it would be to use the zip function:

# WRONG
names = ['Joe', 'Kate', 'Peter']
ages = [30, 28, 41]
# Creates a list
zip(names, ages)

[('Joe', 30), ('Kate', 28), ('Peter', 41)]
Enter fullscreen mode Exit fullscreen mode

Instead we can use the izip function that would return a return an iterator instead of creating a new list:

# RIGHT
from itertools import izip
# Creates an iterator
it = izip(names, ages)
Enter fullscreen mode Exit fullscreen mode

New in Python 3

In Python 3 izip function is removed and zip behaves like izip function in Python 2.x.

Use generators

Lists comprehensions is a powerful tool in Python, but since it can use extensive amount of memory since each list comprehension will create a new list:

# WRONG

# Original list
lst = range(10)
# This will create a new list
lst_1 = [i + 1 for i in lst]
# This will create another list
lst_2 = [i ** 2 for i in lst_1]

Enter fullscreen mode Exit fullscreen mode

A way to avoid this is to use generators instead of list comprehensions. The difference in syntax is minimal: you should use parenthesis instead of square brackets, but the difference is crucial. The following example does not create any intermediate lists:

# RIGHT

# Original list
lst = range(10)
# Won't create a new list
lst_1 = (i + 1 for i in lst)
# Won't create another list
lst_2 = (i ** 2 for i in lst_1)

Enter fullscreen mode Exit fullscreen mode

This is especially handy if you may need to process only part of the result collection to get a result, say to find a first element that match a certain condition.

Use dictionaries idiomatically

Avoid using keys() function

If you need to iterate over keys in a dictionary you may be inclined to use keys function on a hash map:

# WRONG
for k in d.keys():
    print k

Enter fullscreen mode Exit fullscreen mode

But there is a better way, you use iterate over a dictionary it performs iteration over its keys, so you can do simply:

# RIGHT
for k in d:
    print k
Enter fullscreen mode Exit fullscreen mode

Not only it will save you some typing it will prevent from creating a copy of all keys in a dict as keys method does.

Iterate over keys and values

If you use keys method it's really easy to iterate keys and values in a dictionary like this:


#WRONG
for k in d:
    v = d[k]
    print k, v

Enter fullscreen mode Exit fullscreen mode

But there is a better way. You can use items function that returns key-value pairs from a dictionary:

# RIGHT
for k, v in d.items():
    print k, v

Enter fullscreen mode Exit fullscreen mode

Not only this method is more concise, it's a more efficient too.

Use dictionaries comprehension

One way to create a dictionary is to assign values to it one-by-one:

# WRONG

d = {}
for person in people:
    d[person.name] = person


Enter fullscreen mode Exit fullscreen mode

Instead you can use a dictionary comprehension to turn this into a one liner:

# RIGHT
d = {person.name: person for person in people}
Enter fullscreen mode Exit fullscreen mode

Use collections module

Use namedtuple

If you need a struct like type you may just define a class with an init method and a bunch of fields:

# WRONG
class Point(object):
    def __init__(self, x, y):
        self.x = x
        self.y = y
Enter fullscreen mode Exit fullscreen mode

However collections module from Python library provides a namedtuple type that turns this into a one-liner:

# RIGHT
from collections import namedtuple
Point = namedtuple('Point', ['x', 'y'])
Enter fullscreen mode Exit fullscreen mode

In addition namedtuple implements __str__, __repr__, and __eq__ methods:

>>> Point(1, 2)
Point(x=1, y=2)
>>> Point(1, 2) == Point(1, 2)
True
Enter fullscreen mode Exit fullscreen mode

Use defaultdict

If we need to count a number of times an element is encountered in a collection, we can use a common approach:

# WRONG
d = {}
for v in lst:
    if v not in d:
        d[v] = 1
    else:
        d[v] += 1
Enter fullscreen mode Exit fullscreen mode

collections module provides a very handy class for this case which is called defaultdict. It's constructor accepts a function that will be used to calculate a value for a non-existing key:

>>> d = defaultdict(lambda: 42)
>>> d['key']
42
Enter fullscreen mode Exit fullscreen mode

To rewrite counting example we can pass the int function to defaultdict which returns zero if called with no arguments:

# RIGHT
from collections import defaultdict
d = defaultdict(int)
for v in lst:
    d[v] += 1
Enter fullscreen mode Exit fullscreen mode

defaultdict is useful when you need to create any kind of grouping of items in a collection, but you just need to get count of elements you may use Counter class instead:

# RIGHT
from collections import Counter

>>> counter = Counter(lst)
>>> counter
Counter({4: 3, 1: 2, 2: 1, 3: 1, 5: 1})
Enter fullscreen mode Exit fullscreen mode

This post was originally posted at Brewing Codes blog.

. . . . . . .