Pattern Matching in Fibonacci Words (ACM ICPC World Finals 2012)

I was looking at ACM ICPC world final problems the other day when this particular problem caught my eyes. Having spent a few hours working on the Fibonacci primitive roots problem it felt like a good choice for a problem to try to tackle. You can find the source here. It is also on the UVA online judge as problem 1282.

Fibonacci words are defined by letting $F(0) = \mathtt{0},$ $F(1) = \mathtt{1}$ and $F(n)=F(n-1) + F(n-2)$ where $+$ is the string concatenation operation. Hence $F(n)$ is a string for $n \ge 0.$ Specifically, these strings are referred to as Fibonacci words.

The problem requires an efficient algorithm to find the number of occurrences of a given pattern $p$ in a given Fibonacci word $F(n).$ Noting that the length of Fibonacci words grows exponentially, at an exponential rate given by the golden ratio $\phi$ , generating the Fibonacci word in memory and then performing a search is simply not an option for large $n.$

For the rest of this post, let $f_n$ be the regular Fibonacci sequence given by $f_0=|F(0)|=1$ and $f_1=|F(1)=1$ and $f_n=f_{n-1}+f{n-2}$ for $n \ge 2.$

Let $O(n)$ be the number of occurrences of $p$ in $F(n).$

My solution involved first noticing the following about Fibonacci words:

The lengths of Fibonacci words are given by $|F(n)|=f_{n}.$
$F(n)$ is a prefix of $F(n+1).$
$F(n)$ is a suffix of $F(n+2).$ Hence $F(n)$ is a suffix of $F(n+2k)$ for all $k \ge 0.$
For $n \ge 3$ we have $F(n) = F(n-1) + F(n-2) = F(n-2) + F(n-3) + F(n-2).$
Similarly, $n \ge 4$ we have $F(n) = F(n-3) + F(n-4) + F(n-3) + F(n-3) + F(n-4).$

Let $m$ be the smallest number that is large enough such that $|p| < |F(m-3)|=f_{m-3}.$ Then for $n \ge m$ there are four exclusive possibilities for $p$ :

it is not contained in $F(n),$ or
it is contained in $F(n-2),$ or
it is contained in $F(n-3),$ or
it occurs in $F(n-2)+F(n-3)$ and overlaps both, or
it occurs in $F(n-3)+F(n-2)$ and overlaps both.

We can now find a recurrence relation for the number of times $p$ occurs in each of the above sub-cases. The first case is simply counted by $O(n-2)$ and the second by $O(n-3).$ Let $O'(n)$ be the number of times $p$ occurs in $F(n) + F(n-1)$ and overlaps both, and $O''(n)$ the number of times $p$ occurs in $F(n-1)+F(n)$ and overlaps both. Then for $n > m$ we have

O(n) = O(n-1) + O(n-2) + O'(n-2)

For $O',$ given that $n > m,$ we have

O'(n) = O''(n-1).

Similarly, for $O'',$ given that $n > m,$ we have

O''(n) = O'(n-1).

This is enough information to come up with a rather efficient algorithm:

First find $m$ as described above, by finding the first number such that $|p|<f_{m-3}.$
Then find the number of occurrences of $p$ in $F(m) = F(n-2) + F(n-3) + F(n-2)$ using an efficient pattern matching algorithm such as KMP. Classify each occurrence as one of the sub-cases above, and by doing so find the values for $O(m-3),$ $O(n-2),$ $O'(m-2)=O''(m-1),$ and $O''(m-2)=O'(m-1)$ which are going to be our base cases.
Now, based on the recurrences above, and in a bottom to top approach, calculate the value of $O(n)$ for the given $n.$

Since $f_{m-4} \le |p| < f_{m-3}$ we have $|F(m)| = f_{m} \approx \phi^4|p|$ and so $|F(m)| = O(|p|).$ So the KMP string matching part of this algorithm will take $O(|p|)$ time and the rest will take $O(n)$ time. Hence the total time is $O(|p|+n)$ for finding a pattern of length $|p|$ in $F(n).$ This is definitely reasonable since $p$ is much shorter compared to $F(n).$

Here's the Java code for the above algorithm. I use a KMP implementation that I translated over from an older Python project I did. I might write a post on KMP itself soon!

import java.util.LinkedList;
import java.util.Scanner;

public class Problem1282 {
    public static String[] F;
    public static long[] f;

    // Computes F up until F[m] such that M < F[m-3]
    public static void precomputeF(long M) {
        F = new String[101];
        F[0] = "0";
        F[1] = "1";
        for (int i = 1; i < 3 || F[i - 3].length() < M; i++) {
            F[i + 1] = F[i] + F[i - 1];
        }
    }

    public static void precomputef(int M) {
        f = new long[M + 1];
        f[0] = 1;
        f[1] = 1;

        for (int i = 1; i < M; i++) {
            f[i + 1] = f[i] + f[i - 1];
        }
    }

    public static long occurrences(int n, String p) {
        if (p.equals("0")) {
            if (n < 2) {
                return n == 0 ? 1 : 0;
            }
            return f[n - 2];
        }
        if (p.equals("1")) {
            if (n < 2) {
                return n == 0 ? 0 : 1;
            }
            return f[n - 1];
        }

        int m = 3;
        while (f[m - 3] <= p.length()) {
            m++;
        }

        if (m > n) {
            m = n;
        }

        long a = 0; // This will start with O(m-3)
        long b = 0; // And this one with O(m-2)
        long[] o = new long[2]; // Overlapping count

        // Only need to find occurrences up until index f[m-1] since past that
        // we are back to F(n-2) and since F(n) starts with F(n-2) we have
        // already found all the occurrences in F(n-2) already.
        LinkedList<Integer> occurrences = KMP.search(F[m], p, (int) f[m - 1]);
        for (int found : occurrences) {
            if (found < f[m - 2]) {
                if (found + p.length() - 1 < f[m - 2]) {
                    b++;
                } else {
                    o[0]++;
                }
            } else if (found < f[m - 1]) {
                if (found + p.length() - 1 < f[m - 1]) {
                    a++;
                } else {
                    o[1]++;
                }
            }
        }

        int i;
        for (i = 0; i <= n - m + 1; i++) {
            long old_b = b;
            b = b + a + o[i % 2];
            a = old_b;
        }

        return b;
    }

    public static void main(String[] args) {
        precomputeF(100000);
        precomputef(100);

        java.util.Scanner scanner = new Scanner(System.in);
        for (int c = 1; scanner.hasNext(); c++) {
            int n = Integer.parseInt(scanner.nextLine());
            String p = scanner.nextLine();
            System.out.format("Case %d: %d\n", c, occurrences(n, p));
        }
    }
}

class KMP {
    // Calculate the border array (i.e. failure function) of string x.
    // Direct translation of my older Python code
    private static int[] borderArray(String s) {
        char[] x = s.toCharArray();
        int n = x.length;
        int[] beta = new int[n];
        int b = 0;
        for (int i = 1; i < n; i++) {
            while (b > 0 && x[i] != x[b]) {
                b = beta[b - 1];
            }
            beta[i] = b = (x[i] == x[b] ? b + 1 : 0);
        }
        return beta;
    }

    // Return an array of indices where p occurs in x.
    // Overlapping matches are allowed. For example kmpSearch("ababa", "aba")
    // yields 0 and 2.
    public static LinkedList<Integer> search(String xx, String pp, int max_index) {
        char[] x = xx.toCharArray();
        char[] p = pp.toCharArray();
        int[] beta = borderArray(pp);
        int i = 0, m = 0;
        LinkedList<Integer> results = new LinkedList<Integer>();
        while (m + i < x.length) {
            if (m > max_index) {
                break;
            }
            if (x[m + i] == p[i]) {
                if (i == p.length - 1) {
                    results.add(m);
                } else {
                    i++;
                    continue;
                }
            }
            if (i > 0) {
                m = m + i - beta[i - 1];
                i = beta[i - 1];
            } else {
                m = m + 1;
                i = 0;
            }
        }

        return results;
    }
}

Math ∪ Code

by Sahand Saba

Pattern Matching in Fibonacci Words (ACM ICPC World Finals 2012)

Comments