## Introduction

This is the second post in the my Interview Question Deep Dive series, the first one of which was about generating balanced parentheses. In this post, we will start by looking at a simple question about flattening an iterator of iterators, and consider a slightly harder variation of it, and finally see how our solution to the harder variation is analogous to an important theorem in a beautiful area of mathematics, namely set theory, which Hilbert referred to as Cantor's paradise, hence the title of the article.

## Understanding the problem

Here's the coding question I was asked that inspired this post:

Write a program that flattens an iterator that iterates over iterators.

Reading that, the question might come across as a bit confusing, both in terms of what an "iterator" is and what "flattening" means. Let's clarify both of these, specifically in the context of Python as the language we are using. If you are already familiar with iterators, you can skip the rest of this section and jump to the next section for the first solution.

Abstractly speaking, an iterator is an object that allows visiting a set of
objects one at a time and in sequence. An iterator should retain enough state
to go from one element to the next and should be able to determine when we are
at the end of the set. It's important to mention early on that iterators do
*not* have to be finite, and do *not* have to iterate over objects that are in
fully present in memory (e.g., file iterators, or an iterator that returns the
next Tweet corresponding to a hashtag, and so on). An iterator can keep
returning new elements each time it is called without ever reaching the end of
the set.

Let's consider the above definition more concretely by looking at how iterators
work in Python, which is the language we will use for our solutions. In Python,
we define an iterator as any object that has a `__next__` function, and that
behaves as follows. Calling `__next__` should return the next object of the
set the iterator is iterating over. If no next element exists, a
`StopIteration` exception is thrown. It is also assumed that once a
`StopIteration` exception is thrown, the iterator has finished iterating and
any future calls to `__next__` will also throw a `StopIteration` exception.

Objects for which an associated iterator class exists are often called
"iterables". In Python, iterables are defined as objects that have an
`__iter__` member function. Calling that member function should return a new
instance of an iterator for that object. Note that you usually do not call
`__iter__` and `__next__` directly, but rather use the `iter()` and
`next()` functions to call them. For example, let's look at how we can
iterate over the same list using two different iterators:

```
In [1]: a = [1, 2, 3]
In [2]: it1 = iter(a)
In [3]: it2 = iter(a)
In [4]: next(it1)
Out[4]: 1
In [5]: next(it1)
Out[5]: 2
In [6]: next(it2)
Out[6]: 1
In [7]: next(it1)
Out[7]: 3
In [8]: next(it1)
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
```

Note how the two iterators are iterating over the same list independently.

To loop over an iterator, we can call the `next` function until a
`StopIteration` exception is thrown. For example, the following will print
every element given an iterator:

```
def print_all(it):
while True:
try:
print(next(it))
except StopIteration:
return
```

Of course, if you're familiar with iterators in Python you're probably thinking
that's silly. And you're right. It is silly because Python's `for` loops are
a shortcut for the above! Given any iterator you can loop over all their
elements using a `for` loop:

```
def print_all(it):
for x in it:
print(x)
```

To demonstrate how iterators can be infinite, let's consider the simple example
of Python's `itertools.count` iterator:

```
In [1]: from itertools import count
In [2]: it = count()
In [3]: next(it)
Out[3]: 0
In [4]: next(it)
Out[4]: 1
In [5]: next(it)
Out[5]: 2
```

As you can see, an instance of `itertool.count` simply counts up, starting
from `0` and onward, and will never stop (theoretically eventually it would
run out of memory needed to keep the state which is the current number, but
that would take a **very** long time given that Python will switch over to
using a big integer once a regular integer overflows).

Next let's consider what "flatten" means in the context of this question. To
understand it, consider a list of lists: `[[1, 2], [3, 4], [], [5]]`. To
flatten this list means to generate the following: `[1, 2, 3, 4, 5]`, which
are all the elements in the lists the top-level list contains. Note that the
interview question does not ask for recursive flattening (at least in its first
iteration), but rather just one layer of flattening. An example of recursive
flattening would be turning `[[1, 2], [[3, 4], [5]]]` to `[1, 2, 3, 4, 5]`.

## A basic solution

The question, once understood, might come across as deceivingly simple: we can simply loop through the first iterator, then the second, and so on, until we iterate through all the inner elements. Namely, we just use two for loops:

```
def flatten_simple(top_it):
for it in top_it:
for x in it:
yield x
```

Very simple, but what could go wrong here? I encourage to stop here for a minute and think about it before reading further.

## Dealing with infinite iterators

Here's what could go wrong. Our code works perfectly well if all the inner
iterators are finite but if any of the iterators are infinite, we will get
stuck in them, and never visit any of the iterators coming after them. To
notice this, consider `[itertools.count(), [-1]]` as the input to the above
`flatten_simple` function. The output will simply be the output of
`itertools.count()` and `-1` will *never* be reached.

Noticing this, we could simply say that that's expected behaviour, and warn users about that behaviour in our documentation. However, let's assume we add the following requirement to the problem:

Write a program that flattens an iterator that iterates over iterators. If an objectxis visited by any of the iterators in a finite amount of time, your code should also visitxin a finite amount of time.

Reading that, it's not immediately obvious that such a thing is possible. We will look at this in detail in a bit, but first, let's come up with a way of visualizing the problem which will help us come up with a solution.

## Visualizing the problem

We can visualize the input as a potentially infinite graph that is not fully
"discovered" (i.e., the full shape of the graph is not known a priori, or even
in memory). The graph is based on the basic idea that we can treat the `next`
function as an directed edge pointing to the next item. An example can make
this clear. For the input `[[1, 2], [3], [4, 5, 6]]` we can draw the
following graph:

The graph has three types of nodes here, the yellow smaller ones correspond to starting points of iterators, the very top left corner one being the top-level input iterator. The orange ones are the individual elements generated by the inner iterators, and I use blue squares to indicate reaching the end of each iterator. In this way of visualizing things, you can think of the top-level iterator moving down vertically and generating the yellow nodes while each inner iterator moves horizontally, generating the orange nodes.

Now let's try visualizing an example with infinite iterators. Let's take
`[[-1, -2], itertools.count(), [-3, -4, -5]]` as the example. Here is what
the graph would look like:

## Viewing the problem as graph traversal

Visualized as above, it is easy to see the problem as traversing a potentially infinite graph. Our basic solution above was akin to a DFS traversal. Because of its depth-first approach, it of course had the problem of getting stuck in infinite branches of the graph. We can visualize the traversal corresponding to the above simple algorithm as the following animation, which shows how each node is discovered one at a time:

## A BFS approach to handle infinite iterators

To solve the issue of getting stuck in an infinite branch of the graph, we need to do a BFS style traversal of the graph instead of DFS. For this, as usual with BFS, we will use a queue, moving forward in each discovered iterator one at a time. The desired traversal, visualized as an animation, would look like the following:

One simple implementation of the above BFS approach is given below in Python.
You can view it as visiting items in the order of their distance from the
starting point (the top-left yellow node representing the top-level iterator
which is the input). We use a special item (the string `'top'`) in the queue
to keep track of visiting the top-level iterator items (yellow nodes in the
visualization) since they are the only nodes with two outgoing edges (which is
why we append two items onto the queue each time we visit one of these nodes.)

```
def flatten_bfs(input):
from collections import deque
top_iter = iter(input)
q = deque(['top'])
while q:
it = q.popleft()
if it == 'top':
try:
q.append('top')
q.append(iter(next(top_iter)))
except StopIteration:
pass
else:
try:
x = next(it)
yield x
q.append(it)
except StopIteration:
pass
```

## Set theoretic interpretation

As I mentioned before, the problem we looked at in this post, and the BFS-based solution to it, correspond to an important theorem and a possible proof of the theorem in set theory. Let's take a brief look at how that's the case. I will assume basic familiarity with set theory. If you want to learn more on the subject, I recommend Pinter's A Book of Set Theory or the more axiomatic treatment of Suppes in Axiomatic Set Theory.

The fundamental concept related to our algorithms above is that of countable
sets. Countable sets are, in a
limited sense, the mathematical equivalent of iterables in that they can be
iterated over (or "counted" hence the term "countable"). Let's define
$\mathcal{N} = \{0, 1, 2, 3, ...\}$
as the set of natural numbers. This
set is of course infinite, and corresponds to `itertools.count()` in Python.
A set is said to be *countable* if and only if it can be put in one-to-one and
onto correspondence with $\mathcal{N}$
or a subset of
$\mathcal{N}$
. The one-to-one and onto correspondence in this case is
analogous to an iterator, as the iterator can be seen as mapping each element
it iterates over to a natural number (starting from 0 and onward).

Back to the problem, our iterator containing iterators is then analogous to a (disjoint) union of a countable number of countable sets. The problem of flattening such an iterator in a way such that any element will be iterated over in a finite amount of time then mirrors the following theorem from set theory:

**Theorem**: Assume $I$
and $A_i$
for all $i \in I$
are all
countable. Then $\bigcup_{i \in I}A_i$
is also countable.

Finally, our BFS-style algorithm then is the equivalent of a possible proof of the above theorem, and can be translated to a formal mathematical proof of the theorem.

The theorem itself might seem a bit obvious at first but its proof is not completely trivial, and its ramifications are quite wide-ranging. For example, the proof that $\mathcal{Q}$ , the set of rational numbers, is countable is a simple corollary of this theorem and is usually relatively counter-intuitive for most people. Furthermore, since all countable subsets of real numbers can be shown to have zero measure, it follows that $\mathcal{Q}$ has zero measure, which is arguably even more counter-intuitive.

## Sources And Further Reading

As mentioned above, Pinter's A Book of Set Theory and Suppes's Axiomatic Set Theory are both good books to start learning about set theory.

In addition to the above, Cantor's original essays are still one of the best pieces of writing on the subject, if you don't mind the denser mathematics and older notation. I highly recommend Contributions to the Founding of the Theory of Transfinite Numbers as reading for anyone interested in set theory.

For for information on iterators in Python, PEP 234 Iterators is a good starting point.

## Comments