A Study of Python's More Advanced Features Part I: Iterators, Generators, itertools

There are many aspects of Python that appeal to mathematicians. To mention only a few: built-in support for tuples, lists and sets, all with notation that is very similar to conventional mathematical notation, and list comprehensions which are very similar to set comprehension and the "set-builder" notation used for them.

Another set of features that are very appealing to the mathematically-minded are Python's iterators and generators, and the related itertools package. These tools make it easy to write elegant code that deals with such mathematical objects as infinite sequences, stochastic processes, recurrence relations, and combinatorial structures. This post covers my notes on iterators and generators, and goes over some of my experiments while I was learning more about these features of Python.

Iterators

Iterators are objects that allow iteration over a collection. Such collections need not be of objects that already exist in memory, and because of this, they need not necessarily be finite. For official documentation, see "Iterator Types" in Python's documentation.

Let's be a bit more precise in our definitions. An iterable is defined as an object that has an __iter__ method, which is required to return an iterator object. An iterator in turn is an object that has the two methods __iter__ and next (or __next__ in Python 3), with the former returning an iterator object and the latter returning the next element of the iteration. As far as I know, iterators always simply return self in their __iter__ method because they are their own iterators.

Generally speaking, you should avoid calling __iter__ and next directly. By using for or list comprehensions, Python will automatically call them for you. If you have to call them manually, use Python's built-in functions iter and next and pass the iterator or container as parameter. For example, if c is an iterable, then use iter(c) instead of c.__iter__() and similarly, if a is an iterator, use next(a) instead of a.next(). This is similar to how you would use len.

Speaking of len, it is worth mentioning that iterators do not have to, and often do not, have a well-defined length. As such, they often do not implement __len__. If you would like to count the number of items in an iterator, you will have to do so manually, or use sum. An example of this is shown at the end of this article, under the "itertools Module" section.

Some iterables are not iterators and instead have other objects as their iterators. For example, the list object is an iterable but not an iterator (it implements __iter__ but not next). Iterators for list objects are of type listiterator as you can see in the example below. Also notice how while list objects have a well-defined length, listiterator objects do not.

>>> a = [1, 2]
>>> type(a)
<type 'list'>
>>> type(iter(a))
<type 'listiterator'>
>>> it = iter(a)
>>> next(it)
1
>>> next(it)
2
>>> next(it)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> len(a)
2
>>> len(it)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object of type 'listiterator' has no len()

When an iterator finishes it is expected by Python's interpreter to raise a StopIteration exception. However, as mentioned, iterators can iterate over an infinite set. For such iterators, it is the user's responsibility to ensure their use does not lead to an infinite loop. An example of this is shown in a bit.

A very basic example of an iterator is given below. This iterator will simply start counting at 0 and go up indefinitely. It is a simpler version of itertools.count.

class count_iterator(object):
    n = 0

    def __iter__(self):
        return self

    def next(self):
        y = self.n
        self.n += 1
        return y

Usage example is below. Note that the last line attempts to convert the iterator object to a list, which will result in an infinite loop since the particular iterator never ends.

>>> counter = count_iterator()
>>> next(counter)
0
>>> next(counter)
1
>>> next(counter)
2
>>> next(counter)
3
>>> list(counter)  # This will result in an infinite loop!

Finally, to be accurate we should make an amendment to the above: objects that do not have the method __iter__ defined can still be iterable if they define __getitem__. In this case when Python's built-in function iter will return an iterator of type iterator for the object, which uses __getitem__ to go over the items in the list. If either StopIteration or IndexError is raised by __getitem__ then the iteration stops. Let's see an example of this:

class SimpleList(object):
    def __init__(self, *items):
        self.items = items

    def __getitem__(self, i):
        return self.items[i]

And its use:

>>> a = SimpleList(1, 2, 3)
>>> it = iter(a)
>>> next(it)
1
>>> next(it)
2
>>> next(it)
3
>>> next(it)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

Now let's look at a more interesting example: generating the Hofstadter Q sequence given initial conditions using an iterator. Hofstadter first mentioned this nested recurrence in his book "Gödel, Escher, Bach: An Eternal Golden Braid" and since then the problem of proving that the sequence is well-defined for all values of n has been open. The code below uses an iterator to generate a sequence given by the nested recurrence

\begin{equation*} Q(n) = Q(n-Q(n-1)) + Q(n-Q(n-2)) \end{equation*}

given a list as the initial conditions. For example, qsequence([1, 1]) will generate exactly the Hofstadter sequence. We use the StopIteration exception to indicate the sequence can not go any further because an invalid index is required to generate the next element. For example, if the initial conditions are [1,2], then the sequence "terminates" immediately.

class qsequence(object):
    def __init__(self, s):
        self.s = s[:]

    def next(self):
        try:
            q = self.s[-self.s[-1]] + self.s[-self.s[-2]]
            self.s.append(q)
            return q
        except IndexError:
            raise StopIteration()

    def __iter__(self):
        return self

    def current_state(self):
        return self.s

And here's how you can use it:

>>> Q = qsequence([1, 1])
>>> next(Q)
2
>>> next(Q)
3
>>> [next(Q) for __ in xrange(10)]
[3, 4, 5, 5, 6, 6, 6, 8, 8, 8]

Generators

Generators are iterators that are defined using a simpler function notation. More specifically, a generator is a function that uses the yield expression somewhere in it. Generators can not return values, and instead yield results when they are ready. Python automates the process of remembering a generator's context, that is, where its current control flow is, what the value its local variables are, etc. Each time a generator is called using next it yields the next value in the iteration. The __iter__ method will also be automatically implemented, meaning generators can be used anywhere an iterator is needed. Let's look at an example. This implementation is functionally equivalent to our iterator class above, but is much more compact and readable.

def count_generator():
   n = 0
   while True:
     yield n
     n += 1

Now let's see it in action:

>>> counter = count_generator()
>>> counter
<generator object count_generator at 0x106bf1aa0>
>>> next(counter)
0
>>> next(counter)
1
>>> iter(counter)
<generator object count_generator at 0x106bf1aa0>
>>> iter(counter) is counter
True
>>> type(counter)
<type 'generator'>

Now, let's implement Hofstadter's Q sequence using a generator. Notice that the implementation is much simpler. However, we no longer can implement methods such as current_state above. As far as I know, it is NOT possible to access any variables that are stored in the context of a generator from outside the generator and as such current_state can not be accessed from the generator object. (OK, there is the gi_frame.f_locals dictionary but it is implementation specific, namely specific to CPython, and is not a standard part of the language. As such it is not recommended for use.) One possible solution in these circumstances is to yield both the result, and the list so far but I will leave that as an exercise.

def hofstadter_generator(s):
    a = s[:]
    while True:
        try:
            q = a[-a[-1]] + a[-a[-2]]
            a.append(q)
            yield q
        except IndexError:
            return

Notice that a simple return statement with no parameter ends the iteration of a generator. Internally, this raises a StopIteration exception.

Next example for generators comes from the Groupon randomness extraction interview question. See the article here. Here we use two generators to implement first a Bernoulli process, which is an infinite sequence of random boolean values, with True having probability \(p\) and False probability \(q=1-p\), and then to implement a von Neumann extractor that takes as input a Bernoulli process with \(0<p<1\) as a source of entropy and returns a fair Bernoulli process, with \(p=0.5\).

import random


def bernoulli_process(p):
    if p > 1.0 or p < 0.0:
        raise ValueError("p should be between 0.0 and 1.0.")

    while True:
        yield random.random() < p


def von_neumann_extractor(process):
    while True:
        x, y = process.next(), process.next()
        if x != y:
            yield x

Finally, generators are great tools for implementing discrete dynamical systems. The following example shows how the famous tent map dynamical system can be implemented nicely using generators. (On a side note, have a look at how numerical inaccuracies begin to creep in relatively fast and then grow exponentially, a key feature of dynamical systems such as the tent map.)

>>> def tent_map(mu, x0):
...    x = x0
...    while True:
...        yield x
...        x = mu * min(x, 1.0 - x)
...
>>>
>>> t = tent_map(2.0, 0.1)
>>> for __ in xrange(30):
...    print t.next()
...
0.1
0.2
0.4
0.8
0.4
0.8
0.4
0.8
0.4
0.8
0.4
0.8
0.4
0.8
0.4
0.8
0.4
0.799999999999
0.400000000001
0.800000000003
0.399999999994
0.799999999988
0.400000000023
0.800000000047
0.399999999907
0.799999999814
0.400000000373
0.800000000745
0.39999999851
0.79999999702

Another somewhat similar example would be the Collatz sequence.

def collatz(n):
   yield n
   while n != 1:
     n = n / 2 if n % 2 == 0 else 3 * n + 1
     yield n

Notice again that in this example also, we do not have to raise StopIteration manually since it will be raised automatically when control flow reaches the end of the function.

Example of the collatz generator in use:

>>> # If the Collatz conjecture is true then list(collatz(n)) for any n will
... # always terminate (though your machine might run out of memory first!)
>>> list(collatz(7))
[7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]
>>> list(collatz(13))
[13, 40, 20, 10, 5, 16, 8, 4, 2, 1]
>>> list(collatz(17))
[17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]
>>> list(collatz(19))
[19, 58, 29, 88, 44, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]

Recursive Generators

Generators can be recursive, just like any other function. Let's see an example of this by implementing our own simpler version of itertools.permutations, namely a generator that generates all permutations of a given list of items. (Note: In practice use itertools.permutations instead of the method given here. It's faster!) The basic idea here is simple: for each element in the list, we move it to be the first element by swapping it with the first element, and then we permute the rest of the list.

def permutations(items):
    if len(items) == 0:
        yield []
    else:
        pi = items[:]
        for i in xrange(len(pi)):
            pi[0], pi[i] = pi[i], pi[0]
            for p in permutations(pi[1:]):
                yield [pi[0]] + p
>>> for p in permutations([1, 2, 3]):
...     print p
...
[1, 2, 3]
[1, 3, 2]
[2, 1, 3]
[2, 3, 1]
[3, 1, 2]
[3, 2, 1]

Generator Expressions

Generator expressions let you define generators using an even simpler, inline, notation. This notation is very similar to Python's list comprehension notation. For example, the following provides a generator that iterates over all perfect squares. Note how the results of generator expressions are an objects of type generator, and as such they implement both next and __iter__ methods.

>>> g = (x ** 2 for x in itertools.count(1))
>>> g
<generator object <genexpr> at 0x1029a5fa0>
>>> next(g)
1
>>> next(g)
4
>>> iter(g)
<generator object <genexpr> at 0x1029a5fa0>
>>> iter(g) is g
True
>>> [g.next() for __ in xrange(10)]
[9, 16, 25, 36, 49, 64, 81, 100, 121, 144]

We can also implement the Bernoulli process given in previous section using a generator expression, in this example with \(p=0.4\). If a generator expressions requires another iterator to be used as the "loop counter" then itertools.count is a good choice if the generator expression is to produce an infinite sequence. Otherwise xrange can be used.

>>> g = (random.random() < 0.4 for __ in itertools.count())
>>> [g.next() for __ in xrange(10)]
[False, False, False, True, True, False, True, False, False, True]

As mentioned, generator expressions can be passed to any function that requires an iterator as an argument. For example, we can sum the first ten perfect squares by writing:

>>> sum(x ** 2 for x in xrange(10))
285

More examples of generator expressions are given in next section.

The itertools Module

The module itertools provides a set of iterators that can help simplify working with permutations, combinations, Cartesian products, and other such combinatorial constructs. You can read up on all the functions here.

Before reading the rest, note that none of the algorithms given below are optimal in practice, and are only presented here as examples of how to use permutations and combinations. In practice, you should try to avoid enumerating all permutations or combinations unless you have good reason to believe that it is necessary to do so (e.g. you know the problem is NP-complete), as such enumerations grow exponentially in size.

Now let's go over some interesting use cases. For our first example, let's look an alternative way of writing a relatively common pattern: looping over all indices of a 3-dimensional array, and also looping over all indices with \(0 \le i < j < k \le n\).

from itertools import combinations, product


n = 4
d = 3


def visit(*indices):
    print indices

# Loop through all possible indices of a 3-D array
for i in xrange(n):
    for j in xrange(n):
        for k in xrange(n):
            visit(i, j, k)

# Equivalent using itertools.product
for indices in product(*([xrange(n)] * d)):
    visit(*indices)

# Now loop through all indices 0 <= i < j < k <= n
for i in xrange(n):
    for j in xrange(i + 1, n):
        for k in xrange(j + 1, n):
            visit(i, j, k)

# And equivalent using itertools.combinations
for indices in combinations(xrange(n), d):
    visit(*indices)

The alternatives using itertools enumerators have two advantages: they are shorter to write (one line, instead of 3 in this case) and they are much easier to generalize to higher dimensions. That being said, I have not tested the difference in performance between the two, which might be significant for larger \(n\). Use your best judgement, and do some performance tests if you're concerned.

For the second example, let's do something more mathematically interesting: using generator expressions and itertools.combinations and itertools.permutations to calculate the inversion number of a permutation, then sum the inversion numbers of all the permutations of a list. The sum of inversion numbers of all the permutations of \(n\) is OEIS A001809 and it is relatively easy to show that the closed form is \(\frac{n!n(n-1)}{4}\). In practice using this formula will be much more efficient than the code given below, but we are writing it as an exercise in using itertools enumerators.

import itertools
import math


def inversion_number(A):
    """Return the number of inversions in list A."""
    return sum(1 for x, y in itertools.combinations(xrange(len(A)), 2) if A[x] > A[y])


def total_inversions(n):
    """Return total number of inversions in permutations of n."""
    return sum(inversion_number(A) for A in itertools.permutations(xrange(n)))

And usage:

>>> [total_inversions(n) for n in xrange(10)]
[0, 0, 1, 9, 72, 600, 5400, 52920, 564480, 6531840]

>>> [math.factorial(n) * n * (n - 1) / 4 for n in xrange(10)]
[0, 0, 1, 9, 72, 600, 5400, 52920, 564480, 6531840]

For the third example, let's calculate the recontres numbers using a brute-force counting method. Given \(n\) and \(k\), the recontres number \(D_{n,k}\) is defined as the number of permutations of a set of size \(n\) that have exactly \(k\) fixed points. First we write a function that uses a generator expression inside a sum to count the number of fixed points of a permutation. Then using itertools.permutations and another generator expression inside a sum, we count the total number of permutations of \(n\) numbers which have exactly \(k\) fixed points. Here's the result. Of course, this is a very inefficient way of calculating recontres numbers and should not be used in practice. Again, this is just an exercise in using generator expressions and itertools functions and is not meant to be usable code.

def count_fixed_points(p):
    """Return the number of fixed points of p as a permutation."""
    return sum(1 for x in p if p[x] == x)

def count_partial_derangements(n, k):
    """Returns the number of permutations of n with k fixed points."""
    return sum(1 for p in itertools.permutations(xrange(n)) if count_fixed_points(p) == k)

Usage:

# Usage:
>>> [count_partial_derangements(6, i) for i in xrange(7)]
[265, 264, 135, 40, 15, 0, 1]

Further Reading

Pramode's article on generators contains a very interesting example implementing an efficient but simple sieve of Eratosthenes.

David Beazley also has an excellent page on generators found here. Make sure to check it out for more interesting examples and use cases.

Acknowledgments

Some improvements to this article were made based on suggestions by reddit user jms_nh.

Comments