A Tour of the Python Collections Library: Part 1

One of my favorite things about working with Python on a daily basis is how robust the Python standard library is. In particular, Python has a huge range of prebuilt data types that can significantly enhance productivity, code quality and complexity management. For me, so many of my favorites are in the collections module. Today, I want to present three of my favorite Python types: Deques, NamedTuples and DefaultDicts.

Deques

Deques serve as great go-to data structures for either stacks or queues (although it’s worth noting that the queue.Queue data structure is built for multithreading/multiprocessing and implements some locks, and as such are better choices for those applications). Implemented as a doubly-linked list, the first and last element of a deque can be accessed in constant time.

I’m a strong believer that code examples are the best way to communicate about code ideas, so let’s build a deque here:

from collections import deque
deq = deque()

# Append to end of queue
deq.append("potato")
deq.append("other potato")
deq.append("last potato")

# Get first and last value
first = deq.popleft() # "potato"
last = deq.pop() # "last potato"

# Add to front of queue
deq.appendleft("new potato")

# Create a new deque with maxlen
short_deque = deque(maxlen=5)

While the standard library docs are very detailed and well-organized, they tend not to cover example use cases. Deques are useful in any case in which you need to read data in a first in, first out (FIFO) or last in, first out (LIFO) manner. For instance, a deque can can hold a list of undo operations.

Namedtuples

Tuples are great for constant-time lookups, destructuring return values and holding immutable data, but they introduce some cognitive overhead through their obscurity. For instance, consider this tuple:

data = ('Falafel Gyro', 17, False, 191)
data[0]
> 'Falafel Gyro'

While it’s clear what the values here ARE, it’s certainly not clear what they represent. Namedtuples elegantly solve this problem:

MenuItem = namedtuple('MenuItem', ['name', 'price', 'gluten_free', 'calories'])

falafel = MenuItem('Falafel Gyro', 17, False, 191)

falafel.name
> 'Falafel Gyro

falafel.calories
> 191

Without changing any of the basic functionality, we have improved our code tremendously. Raymond Hettinger, inventor of the namedtuple and one of the best Python teachers out there, provides other great example use cases in this excellent video.

If you know of a good falafel sandwich with less than 200 calories, please email me immediately.

DefaultDicts

We’re accustomed to dicts that return a KeyError when accessing a nonexistent key, eg


In [1]: potato_tracker = {'home': 55, 'work': 12}

In [2]: potato_tracker['home']
Out[2]: 55

In [3]: potato_tracker['garden']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-6-86a01747b0ab> in <module>()
----> 1 potato_tracker['garden']

KeyError: 'garden'

This exception serves us in a lot of cases; it ensures that we actually have the data we’re trying to access. However, in my potato inventory system, I can just assume that anywhere I don’t have any potatoes where I don’t have an entry. It would make sense, then, just to return a zero-value in these cases.

 
In [1]: from collections import defaultdict

In [2]: default_tracker = defaultdict(int)

In [3]: default_tracker['home'] = 55

In [4]: default_tracker['work'] = 12

In [5]: default_tracker['home']
Out[5]: 55


In [6]: default_tracker['Dublin']
Out[6]: 0

That’s a good overview of the basics for these datatypes. Soon, I’ll be discussing the rest of the collections module. In the meantime, thanks for stopping by!