Understanding Recurrence Relations Using Automata, Python Code, And Javascript Visualizations

Introduction

Recurrence relations are very often taught in first- or second-year computer science and discrete mathematics courses. In this post, I take a somewhat different, more visual and much more programming-oriented, approach to understanding linear recurrences by drawing links between linear recurrences, finite-state machines, and matrices. We will be using the problem of generating all domino-tilings of a board as the springboard, and explore several interesting and related concepts as tangents on the way. Code in Python, and visualizations in JavaScript are used to demonstrate the ideas throughout, and a few exercises are provided if you want to challenge yourself further while reading.

The Problem

Let's start with a simple problem. You are given a \(2 \times n\) board and an unlimited number of \(2 \times 1\) dominoes. How many ways can you cover the board using the dominoes?

Let's look at an example. When \(n = 4\) there are \(5\) ways of tiling the board:

Instead of finding a formula to calculate the number of ways such tilings are possible, let's solve a slightly harder problem: generating all the tilings. I recommend you take a moment to think about how you would approach solving this problem, as there are several approaches that work. The simplest solution is to use recursion. Starting with a blank \(2 \times n\) board, the first column can either be covered using a single vertical domino, or two horizontal ones:

We are then either left with a \(2 \times (n-1)\) board or \(2 \times (n-2)\) board, respectively. Representing the choice of a single vertical domino with '|' and two horizontal dominoes with '==', we get the following recursive code to generate all the tilings. This way of representing our choices has the advantage that the length of the string represents the length of the board.

def tilings(n):
    if n < 0:
        return
    if n == 0:
        yield ''
    for tiling in tilings(n - 1):
        yield '|' + tiling
    for tiling in tilings(n - 2):
        yield '==' + tiling

For example, the generated sequence of strings for \(n=4\) is provided below, which corresponds precisely to the diagram above.

||||
||==
|==|
==||
====

The recursion tree can be pictured as shown below, with the leaves of the tree representing the final tilings.

The above code is of course a simple generalization of the recurrence given by

\begin{align*} F(n) &= F(n - 1) + F(n - 2), \\ F(0) &= 1, \\ F(1) &= 1. \end{align*}

This linear recurrence, called the Fibonacci recurrence, will look (perhaps nauseatingly) familiar to most computer science and math students. It is interesting to point out that the self-symmetry present in the recurrence equation is present in the recursion tree as well. The following animated diagram demonstrates this symmetry.

Finite State Machines And Languages

Now, let's consider a somewhat more difficult problem. This time, we want to cover a \(m \times n\) board using the same dominoes. Let's first look at an example for \(n=4\) and \(m=3\):

Of course here we can revert back to the same method we used previously and try to cover the first column first and recursively continue from there, but the logic gets a little complicated because tiling the first column this time does not neatly result in the second column either being completely empty or completely covered as it did with \(m=2\). This gets even more complicated with \(m=4\). Have a look at the tilings for \(m=4\) and \(n = 4\):

So, instead of continuing in the same manner, let's see if we can create a non-deterministic finite-state machine (FSM) that would recognize valid tilings, which can then also be used to generate all valid tilings.

The basic idea is this: let the states of the FSM be all the \(2^m\) ways that a single column can have covered spots. More explicitly, the set of states will be all the subsets of \(\{0, 1, ..., m-1\}\). We will use zero-based indexing to keep it consistent with Python's (and JavaScript's) lists. The transitions of the FSM will be provided by all possible ways to cover the empty tiles in the source state, and the label for each transition will be the dominoes used and how they are to be placed. An example will make this much more clear. Here is what the FSM will look like for \(m=2\). Note that only states reachable from the fully uncovered state are shown.

For \(n=3\), the FSM, given below, is much more interesting. Note how the first column of the label always fills exactly the gaps in the source state, and how the second column of the label is the target state.

One quick note regarding the diagrams. Earlier, I said that the states of the FSM will be all the \(2^m\) ways that a single column can have covered spots. In the diagrams though, there are fewer states shown. Why is that? Well, some states are omitted because they were unreachable from the empty initial state and were therefore irrelevant provided we are counting how many ways we can tile an empty board.

Let's look at the Python code that constructs this FSM, given \(m\). Note that the FSM is independent of \(n\). We will discuss how \(n\) comes into play in a bit. Each state in the FSM is represented as a frozen set of the empty rows in the column. (Frozen sets are used because sets are mutable and hence can not be used as keys for a hash-table.) The FSM is represented by a hash-table with the states as the keys and the values being a list of the outgoing transitions of the state. The transitions are pairs, with the first item in the pair being the target state, and the second the list of domino placements that form the label. The algorithm to create the FSM basically starts from the all-empty column and recursively creates all its outgoing transitions. Note how in this more general example, we are still using the same recursive idea that we used for the \(2 \times n\) case. Finally, every time a new target is created, it is pushed onto a processing queue, so that its outgoing transitions can be calculated, using a BFS-like algorithm.

from collections import deque


def construct_fsm(m):
    """Construct an FSM that recognizes domino tilings of an m x n board."""
    fsm = {}
    starting_state = frozenset(range(m))
    Q = deque()

    def construct_outgoing_transitions(source):
        if source in fsm:
            return  # This state has already been processed
        fsm[source] = []
        if not source:  # No empty tiles
            fsm[source].append((starting_state, []))
            return

        def recurse(j, target, tiling):
            if j > max(source):
                # End of recursion, process the result
                fsm[source].append((target, tiling))
                Q.append(target)
            elif j in source:
                if j + 1 in source:
                    # Place vertical domino starting at row j
                    recurse(j + 2, target, tiling + [(j, 'V')])
                # Place horizontal domino starting at row j
                recurse(j + 1, target - {j}, tiling + [(j, 'H')])

        recurse(min(source), starting_state, [])

    # Construct the graph in a BFS-like order
    Q.append(starting_state)
    while Q:
        state = Q.popleft()
        construct_outgoing_transitions(state)

    return fsm

As an exercise, you can try to generalize the code so that other possible shapes of dominoes are allowed in the tilings as well.

In general, given a non-deterministic FSM, it is rather easy to generate all strings using a DFS traversal of the FSM. Note that transitions with empty labels, (also known as \(\epsilon\)-transitions), can be problematic here, since you can get caught in an infinite loop if a cycle with only empty labels (also known as an \(\epsilon\)-cycle) is encountered. Of course, we can improve our DFS algorithm to detect this and prevent it, but turns out that for this particular use-case no \(\epsilon\)-cycle can ever be created by our construct_fsm code. Try to reason for yourself why this is the case. Hint: what does an empty label on a transition imply about the source of the transition?

Exercise: while \(\epsilon\)-cycles are not issue for this particular use-case, they can be if the FSM is generated for other purposes. How would you handle \(\epsilon\)-cycles? Can you do some preprocessing to remove them entirely? If not, how would you handle them while traversing the FSM?

Here is the code to generate all strings of length \(n\) generated by a given FSM:

def generate_fsm_language(fsm, n, initial_state, accepting_states):
    """
    Generate all strings of length n accepted by the given FSM.
    Assumes no epsilon-cycles exist in the FSM.
    """

    def dfs(current_state, string_so_far):
        if len(string_so_far) == n:
            if current_state in accepting_states:
                yield string_so_far
            return
        for target_state, label in fsm[current_state]:
            yield from dfs(target_state, string_so_far + [label])

    yield from dfs(initial_state, [])

With the code to construct the FSM and the code to generate strings accepted by an FSM, the task of generating all the tilings of an \(m \times n\) board becomes a simple application of the two algorithms:

from construct_fsm import construct_fsm
from fsm_language import generate_fsm_language


def tilings(m, n):
    """Generate all tilings of an m x n board using dominos."""
    fsm = construct_fsm(m)
    initial_state = accepting_state = frozenset(range(m))
    yield from generate_fsm_language(fsm, n, initial_state, [accepting_state])

Here is an example output for \(m=2\) and \(n=4\), with corresponding diagram given immediately after.

[[[(0, 'V')], [(0, 'V')], [(0, 'V')], [(0, 'V')]],
 [[(0, 'V')], [(0, 'V')], [(0, 'H'), (1, 'H')], []],
 [[(0, 'V')], [(0, 'H'), (1, 'H')], [], [(0, 'V')]],
 [[(0, 'H'), (1, 'H')], [], [(0, 'V')], [(0, 'V')]],
 [[(0, 'H'), (1, 'H')], [], [(0, 'H'), (1, 'H')], []]]

Graphs, Paths, And Adjacency Matrices

Let's move our focus from generating all the tilings to just counting them. Of course, doing the former would also achieve the latter, but is it possible to count the possible tilings quicker than generating them? This is an interesting question to ask in general. Namely, if you are given a description of a finite set of mathematical objects \(S\), is it possible to count how many objects are in the set based on the description more efficiently than it is possible to list all the objects? This question is related to the #P and #P-complete classes of computational complexity, in case you are interested in reading about them. For some problems, the answer is simply no. In other words, if you want to count how many objects are in \(S\), the best you can do is iterate through every object. An example of a problem where this is the case is counting the number of assignments that satisfy a given boolean formula, or counting the number of linear extensions of a poset.

OK, back to the problem at hand. In our case, it is possible to calculate the total number of possible tilings much faster that it is possible to iterate through all of them. To see how this is possible, we will first look at a few basic results from graph theory. First is that, treating the FSM as a directed graph (by ignoring the labels), means that the FSM can be represented by an adjacency matrix. The process to generate the adjacency matrix is relatively easy: first map every state to an index from \(0\) to \(s-1\) where \(s\) is the number of states (I use a simple Python trick mentioned here to do this), and then go through each transition and increment the corresponding entry in the matrix. The code will be as simple as this:

from collections import defaultdict
from itertools import count


def fsm_to_matrix(fsm):
    state_to_index = defaultdict(count().__next__)
    M = [[0] * len(fsm) for state in fsm]
    for source_state, transitions in fsm.items():
        for target_state, label in transitions:
            M[state_to_index[source_state]][state_to_index[target_state]] += 1
    return M

Let's look at a couple of example matrices. For \(m=2\), with the simple two-state FSM given the previous section, the adjacency matrix is given by:

\begin{equation*} A = \left[ \begin{array}{cc} 0 & 1\\ 1 & 1 \end{array} \right] \end{equation*}

And for \(m=3\), with the 6-state FSM also given in the previous section, the matrix is given by:

\begin{equation*} A_3 = \left[ \begin{array}{cccccc} 0 & 1 & 0 & 0 & 0 & 0\\ 1 & 0 & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 0 & 1 & 0\\ 0 & 1 & 0 & 1 & 0 & 1\\ 0 & 0 & 0 & 0 & 1 & 0 \end{array} \right] \end{equation*}

Why this new presentation of the same structure, you ask? Well, at first glance it might seem like we merely lost some information, namely the labels. On the other hand, now we are dealing with numbers and matrices, both of which are algebraic structures, allowing us new ways of approaching the counting problem. The key theorem that will help us here is that given the adjacency matrix \(A\) of a graph, you can count the number of paths of length \(n\) from a given vertex to another by first raising \(A\) to the power of \(n\) to get \(A^n\) and then simply looking at the corresponding entry. An inductive proof for this theorem is relatively easy. You can also read more about it this paper on the mathematics in Good Will Hunting, which goes into some detail. For example, in the case of \(m=2\) and \(n=4\) we have

\begin{equation*} A^4 = \left[ \begin{array}{cc} 0 & 1\\ 1 & 1 \end{array} \right]^4 = \left[ \begin{array}{cc} 2 & 3\\ 3 & 5 \end{array} \right] \end{equation*}

Since the initial state (representing all empty rows) is mapping to index \(1\) (remember indices are zero-based so index \(1\) is the second row/column) we have that the number of paths of length \(4\) from the initial state to itself is \(5\), as expected. Remember the \(5\) possible tilings shown at the beginning of the article? There it is.

We reduced the problem of counting the number of tilings to that of exponentiating matrices. This is good, because as it turns out, there is a very fast algorithm for exponentiation, known as a "exponentiation by squaring", which reduces the number of total multiplications from \(n-1\) to \(O(\log(n))\). You can read up more on it on the Wikipedia article on exponentiation by squaring. I also talked about it in a few of my articles, in particular my post on RSA. To recap, the basic recursion for it is:

\begin{equation*} A^m= \begin{cases} (A^{\frac{m}{2}})^2 & \mbox{ if } m \mbox{ is even} \\ A \cdot (A^{\frac{m-1}{2}})^2 & \mbox{ if } m \mbox{ is odd}. \end{cases} \end{equation*}

Great. Now we can count the total number of possible tilings quite quickly, even for very large \(n\).

Asymptotic Behaviour, Eigenvalues and Eigenvectors

Is it possible to use the matrix formulation of the counting problem to gain insight into the asymptotic behaviour of the number of tilings? In other words, can we tell if the number of tilings as a function of \(n\) is \(O(n)\), \(O(n^2)\), \(O(2^n)\) or something entirely different? To get a better sense of how this function behaves let's look at eigenvalues and how they come in handy in this case.

Let's start with our \(A\) matrix given above and assume that there exists a vector \(\vec{v} = \left[\begin{array}{c}a \\ b\end{array}\right]\) that, when transformed by \(A\), is only scaled. In other words, we are looking for a vector \(\vec{v}\) and a scalar \(\lambda \in \mathbb{R}\) such that \(A \vec{v} = \lambda \vec{v}\). Such a vector \(\vec{v}\) is called and eigenvector of the matrix, and \(\lambda\) is called an eigvenvalue of it. Of course, at this point you might be wondering why we chose to look for such a vector and number. What's the motivation? The idea is quite simple. Thinking of the matrix as a linear transformation of vectors, what we are seeking here is a vector that does not change in direction when transformed using the matrix, and only scaled. Let's look at this in action by looking at how the transformation \(A\) acts on a simple circle. Press the "Transform" button few times. You can press "Reset" and start over if you like.

As you almost certainly noticed, the circle ends up getting stretched along a single line. Based on this observation, we can guess that if we can figure out what this line is, we can find the first eigenvalue. Let's try this approach and see what happens. First, we start with a simple vector, say \([1, 0]\), and keep transforming it using \(A\), normalizing it at each step:

from linear_alg import transform, normalized


def estimate_eigens(M, starting_vector, iterations=20):
    v = starting_vector
    for i in range(iterations):
        v1 = transform(M, v)
        v2 = transform(M, v)
        if i > 0:
            scales = [y / x for (x, y) in zip(v, v2)]
            s_min, s_max = min(scales), max(scales)
            print s_min, s_max, s_max - s_min
        v = normalized(v1)


def main():
    M = [[0, 1], [1, 1]]
    estimate_eigens(M, [0, 1])


if __name__ == '__main__':
    main()

Here's the output:

1.0 2.0 1.0
1.5 2.0 0.5
1.5 1.66666666667 0.166666666667
1.6 1.66666666667 0.0666666666667
1.6 1.625 0.025
1.61538461538 1.625 0.00961538461538
1.61538461538 1.61904761905 0.003663003663
1.61764705882 1.61904761905 0.00140056022409
1.61764705882 1.61818181818 0.000534759358289
1.61797752809 1.61818181818 0.00020429009193
1.61797752809 1.61805555556 7.80274656675e-05
1.61802575107 1.61805555556 2.98044825939e-05
1.61802575107 1.61803713528 1.13842055531e-05
1.61803278689 1.61803713528 4.34839326857e-06
1.61803278689 1.61803444782 1.66093643617e-06
1.6180338134 1.61803444782 6.34421557066e-07
1.6180338134 1.61803405573 2.42327429234e-07
1.61803396317 1.61803405573 9.25608476532e-08
1.61803396317 1.61803399852 3.53550972942e-08

Notice that the distance between the maximum scale factor and the minimum scale factor gets smaller quite quickly, meaning we are converging on an eigenvalue. Now, here's a quick math pop knowledge quiz: what number is the number that we are converging to, namely \(1.61803396317\)? The answer is the golden ratio, which is not surprising since the matrix was associated with the Fibonacci sequence.

Needless to say that the algorithm above for calculating eigenvalues is very basic and can be improved on in many ways. Eigenvalues and eigenvectors are so useful that considerable research has gone into calculating them in efficient and numerically accurate ways. The most frequently mentioned use of eigenvalues and eigenvectors is Google's page-rank algorithm. You can read more about how eigenvectors were used in Google's page-rank in the paper The $25,000,000,000 Eigenvector: The Linear Algebra Behind Google.

Before we look at how the found eigenvalue is of use, let's delve a bit deeper into the concepts of fixed points.

Fixed Points And Recursion

The ideas involved in the definitions of eigenvalues and eigenvectors, and the basic algorithm shown in the previous section for calculating them, are instances of a much more general idea, that of functions and their fixed points. As a bit of an interesting tangent (this post has been full of tangents, hasn't it?), let's delve into this generalization a bit here and see how it's related to both eigenvalues, and also recursively defined functions. We will then briefly use these ideas to solve a classic computer science puzzle, that of defining anonymous recursive functions.

Dynamical systems are mathematical models of systems that evolve over time according to a fixed set of rules, allowing the changes in the system to be studied as iterative applications of the rules. In the previous section, we looked at iterative transformation of a set of points on the two-dimensional plane. This basic system can be seen as an example of a dynamical system, in which the rule of change is given by the matrix transformation. We then looked for fixed points (up to scalar multiplication) of this dynamical system, promising that they will allow us to understand the asymptotic behaviour of the total number of domino tilings.

Generalizing a dynamical system to be any function from a set to itself (e.g. from the set of points on the plane to the set of points on the plane), we can look at recursively defined functions as fixed points of a dynamical system too. To understand this, consider the Fibonacci recursion given by \(F(n) = F(n - 1) + F(n - 2)\). Instead of thinking of it as a function that given a number outputs another number, we can instead generalize and consider a meta-function \((G(F))(n) = F(n -1) + F(n - 2)\) where \(G\) is a function which takes \(F\) as a parameter and returns a function that is based on \(F\). For example, if \(F(n) = n\) then \((G(F))(n) = F(n - 1) + F(n-2) = n - 1 + n - 2 = 2n - 3\).

Notice that since \(G\) takes a function and returns a function, we can think of it as a dynamical system. The natural question to ask then is whether this dynamical system has fixed points. A fixed point of \(G\) would be precisely a function that would satisfy \(F(n) = (G(F))(n) = F(n - 1) + F(n-2)\). Hence any function satisfying the Fibonacci recursion is a fixed point of this dynamical system.

Is it possible to find fixed points of a meta-function such as \(G\) in general? The Y combinator does exactly this. A Python version of the combinator looks like the following:

Y = lambda f: (lambda x: x(x))(lambda x: f(lambda y: x(x)(y)))

Combining this with \(G\) given above, we can calculate the Fibonacci numbers recursively, using anonymous lambdas entirely:

print (lambda f: (lambda x: x(x)) (lambda x: f(lambda y: x(x)(y))))(
       lambda f: (lambda n: n if n < 2 else (f(n - 1) + f(n - 2))))(10)
# Prints F(10) = 55

Pretty cool, right? Of course, this is just having fun with theory. Needless to say, don't write anything like this in production code!

Back To Eigenvalues and Eigenvectors

Getting back to eigenvalues and eigenvectors as fixed points (up to scalar multiplication) of a matrix transformation, we can see that if we happen to find enough eigenvectors that are linearly independent we can use the set of eigenvectors as the basis. This will allow us to write our matrix as a simple diagonal matrix in that basis. A full treatment of eigenvalues and eigenvectors is well outside the scope of this article. The goal here is to just show how eigenvectors and eigenvalues are related to the rest of the topics discussed, and to show one of the their applications, which is in approximating asymptotic behaviour of linear recursions. For a full treatment, any book on linear algebra will do. I personally recommend Jim Hefferon's free textbook.

The Wikipedia article on eigenvectors does a good job of showing how a change of coordinates to a basis formed by the eigenvectors of a matrix results in the diagonal matrix. I recommend skimming over that section since I will be using that result here.

Remember that for the simple case of \(m=2\), the matrix \(A\) was given as

\begin{equation*} A = \left[ \begin{array}{cc} 0 & 1\\ 1 & 1 \end{array} \right] \end{equation*}

Let's assume for now that the matrix \(A\) has two eigenvalues, \(\lambda_1\) and \(\lambda_2\). We already estimated that \(\lambda_1 \approx 1.618033\). For the sake of this article, we care less about the numerical values of eigenvalues and more about the fact that we have two distinct eigenvalues with linearly independent eigenvectors. Assume the corresponding eigenvectors are \(v_1\) and \(v_2\), and let \(V\) be \(2\times 2\) matrix with \(v_1\) and \(v_2\) as the columns. Using the result mentioned in the Wikipedia page, we have

\begin{equation*} A = V \left[ \begin{array}{cc} \lambda_1 & 0\\ 0 & \lambda_2 \end{array} \right] V^{-1} \end{equation*}

This is good news, because it means we can write powers of \(A\)

\begin{equation*} A^n = V \left[ \begin{array}{cc} \lambda_1^n & 0\\ 0 & \lambda_2^n \end{array} \right] V^{-1} \end{equation*}

(If you can't immediately see why, take minute to convince yourself of the previous equation.) With this last result, we see how the eigenvalues determine the asymptotic behaviour of the recursion. In the case of \(A\), we have

\begin{equation*} \lambda_1 = \varphi = \frac{1 + \sqrt{5}}{2} \approx 1.61803398\cdots \end{equation*}

And hence \(F(n) = O(\varphi ^ n)\), with \(\lambda_2\) not playing a role in the asymptotic behaviour since \(0 < \lambda_2 < 1\). (More specifically, because \(\lambda1 > \lambda_2 > 0\).)

Conclusions

I started writing this post originally just to tackle the simple problem of generating all domino-tilings of a \(2 \times n\) board. As I started writing it, I decided to generalize it in various ways and thought of various tangents that the reader might find interesting, especially for readers that might not have seen how several seemingly unrelated concepts can be brought together using this simple problem. In the end the post ended up being a visual explanation of the solution to generating all the domino-tilings of a board, and using that problem as a springboard to explore several related concepts and areas of mathematics and computer science, namely languages, finite-state machines, graphs, eigenvalues and eigenvectors, fixed points, and the Y combinator. Any suggestions, questions, or comments? Feel free to leave them below, and thanks for reading!

Comments