Python's 'heapq' module

Kelvin Wangonya - Dec 21 '18 - - Dev Community

Often when working with collections of data, you may want to find the smallest or largest item. It's easy enough to write a function that iterates through the items and returns the smallest or largest one, or use the builtin min(), max(), or sorted() functions. Another interesting way may be implementing a heap (priority) queue.

Python provides a pretty convenient module called heapq that does that for you. heapq comes with a cool set of inbuilt functions that you can read more about in the docs. For this short post, I just want to show how you can easily find the smallest and largest items in a collection

Finding the smallest items

Find 3 of the smallest items using the nsmallest function:

import heapq

numbers = [9, 45, 21, 4, 63, 3, 109]

print(heapq.nsmallest(3, numbers)) # [3, 4, 9]
Enter fullscreen mode Exit fullscreen mode

Also, converting a list into a heap using the heapify function automatically sets the smallest item as the first:

import heapq

numbers = [9, 45, 21, 4, 63, 3, 109]

heapq.heapify(numbers)
print(numbers)
# Output: [3, 4, 9, 45, 63, 21, 109]
Enter fullscreen mode Exit fullscreen mode

You can then pop from the heap with heappop():

heapq.heappop(numbers) # 3
print(numbers)
# Output: [4, 45, 9, 109, 63, 21]

heapq.heappop(numbers) # 4
print(numbers)
# Output: [9, 45, 21, 109, 63]
Enter fullscreen mode Exit fullscreen mode

Finding the largest items

Find 3 of the largest items using the nlargest function:

import heapq

numbers = [9, 45, 21, 4, 63, 3, 109]

print(heapq.nlargest(3, numbers)) # [109, 63, 45]
Enter fullscreen mode Exit fullscreen mode

A more practical example

import heapq

people = [
    {'fname': 'John', 'lname': 'Doe', 'age': 30},
    {'fname': 'Jane', 'lname': 'Doe', 'age': 25},
    {'fname': 'Janie', 'lname': 'Doe', 'age': 10},
    {'fname': 'Jane', 'lname': 'Roe', 'age': 22},
    {'fname': 'Johnny', 'lname': 'Doe', 'age': 12},
    {'fname': 'John', 'lname': 'Roe', 'age': 45}
]

oldest = heapq.nlargest(2, people, key=lambda s: s['age'])
print(oldest)
# Output: [{'fname': 'John', 'lname': 'Roe', 'age': 45}, {'fname': 'John', 'lname': 'Doe', 'age': 30}]

youngest = heapq.nsmallest(2, people, key=lambda s: s['age'])
print(youngest)
# Output: [{'fname': 'Janie', 'lname': 'Doe', 'age': 10}, {'fname': 'Johnny', 'lname': 'Doe', 'age': 12}]
Enter fullscreen mode Exit fullscreen mode

It should be noted though that the nsmallest(n, iterable) and nlargest(n, iterable) functions perform best for smaller values of n. For larger values, it is more efficient to use the sorted() function. Also, when n==1, it is more efficient to use the builtin min() and max() functions.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .