You Should Use Python Sets

Dylan Anthony - Jul 29 '19 - - Dev Community

This post contains references to Big O notation1, if you aren't familiar with it (or don't remember how it works) now is a good time for a refresher.

There are a ton of data structures available in the standard Python library, most of which go underutilized. Today I'd like to focus on one type in particular you should probably be using more than you are: the set. You may be already aware of the general concept of sets in computer science2, the Python set3 is exactly that.

What is it?

For those of you who don't already know, a set is an unordered collection of hashable objects which cannot contain any duplicates. The hashable part is key here, much like the keys in a dict4 (A.K.A. associative array 5), each entry is hashed to create an index into the underlying data structure. You can think of a set much like a dict without any values.

When to use it?

A set is an extremely useful alternative to the more popular list in several scenarios:

  1. Looking for membership (e.g. the in operator). The lookup time for a set is O(1) whereas for a list it's O(n)6.
  2. If you'll be removing members, same story as in here regarding complexity.
  3. If you need to compare, contrast, or combine collections - the set has operators for all of these.
  4. If you cannot have any duplicate values in the collection.

When not to use it?

  1. If you need to have more than one copy of the same value in the collection.
  2. If the order of the objects matters
  3. If the objects in your collection are not hashable7

How to use it?

set_with_values = {'a', 'b', 'c'}
other_set = set()  # Note {} would be a dict

assert 'a' in set_with_values  # Super speedy operation

other_set.add('c')

assert other_set < set_with_values  # This means other_set is a subset of set_with_values

assert set_with_values > other_set  # Super set

other_set.add('d')

# Set defines a bunch of useful operators
assert set_with_values | other_set == {'a', 'b', 'c', 'd'}
assert set_with_values & other_set == {'c'}
assert set_with_values - other_set == {'a', 'b'}
assert set_with_values ^ other_set == {'a', 'b', 'd'}

set_with_values |= other_set  # You can perform updates with all of those operators
assert set_with_values == {'a', 'b', 'c', 'd'}  # Note only 1 copy of 'c'

set_with_values.remove('d')  # Again, super fast compared to lists
Enter fullscreen mode Exit fullscreen mode

Conclusion

If you aren't already using sets all over your code, you probably should be. Basically any time you're using in with a list or del with a list, you should consider using a different structure.


  1. https://en.wikipedia.org/wiki/Big_O_notation 

  2. https://en.wikipedia.org/wiki/Set_(abstract_data_type) 

  3. https://docs.python.org/3.7/library/stdtypes.html#set-types-set-frozenset 

  4. https://docs.python.org/3.7/library/stdtypes.html#mapping-types-dict 

  5. https://en.wikipedia.org/wiki/Associative_array 

  6. https://www.ics.uci.edu/~pattis/ICS-33/lectures/complexitypython.txt 

  7. https://docs.python.org/3/glossary.html#term-hashable 

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .