Pattern Matching in Fibonacci Words (ACM ICPC World Finals 2012)

I was looking at ACM ICPC world final problems the other day when this particular problem caught my eyes. Having spent a few hours working on the Fibonacci primitive roots problem it felt like a good choice for a problem to try to tackle. You can find the source here. It is also on the UVA online judge as problem 1282.

Fibonacci words are defined by letting F(0)=0,F(0) = \mathtt{0}, F(1)=1F(1) = \mathtt{1} and F(n)=F(n1)+F(n2)F(n)=F(n-1) + F(n-2) where ++ is the string concatenation operation. Hence F(n)F(n) is a string for n0.n \ge 0. Specifically, these strings are referred to as Fibonacci words.

The problem requires an efficient algorithm to find the number of occurrences of a given pattern pp in a given Fibonacci word F(n).F(n). Noting that the length of Fibonacci words grows exponentially, at an exponential rate given by the golden ratio ϕ\phi , generating the Fibonacci word in memory and then performing a search is simply not an option for large n.n.

For the rest of this post, let fnf_n be the regular Fibonacci sequence given by f0=F(0)=1f_0=|F(0)|=1 and f1=F(1)=1f_1=|F(1)=1 and fn=fn1+fn2f_n=f_{n-1}+f{n-2} for n2.n \ge 2.

Let O(n)O(n) be the number of occurrences of pp in F(n).F(n).

My solution involved first noticing the following about Fibonacci words:

  1. The lengths of Fibonacci words are given by F(n)=fn.|F(n)|=f_{n}.
  2. F(n)F(n) is a prefix of F(n+1).F(n+1).
  3. F(n)F(n) is a suffix of F(n+2).F(n+2). Hence F(n)F(n) is a suffix of F(n+2k)F(n+2k) for all k0.k \ge 0.
  4. For n3n \ge 3 we have F(n)=F(n1)+F(n2)=F(n2)+F(n3)+F(n2).F(n) = F(n-1) + F(n-2) = F(n-2) + F(n-3) + F(n-2).
  5. Similarly, n4n \ge 4 we have F(n)=F(n3)+F(n4)+F(n3)+F(n3)+F(n4).F(n) = F(n-3) + F(n-4) + F(n-3) + F(n-3) + F(n-4).

Let mm be the smallest number that is large enough such that p<F(m3)=fm3.|p| < |F(m-3)|=f_{m-3}. Then for nmn \ge m there are four exclusive possibilities for pp :

  1. it is not contained in F(n),F(n), or
  2. it is contained in F(n2),F(n-2), or
  3. it is contained in F(n3),F(n-3), or
  4. it occurs in F(n2)+F(n3)F(n-2)+F(n-3) and overlaps both, or
  5. it occurs in F(n3)+F(n2)F(n-3)+F(n-2) and overlaps both.

We can now find a recurrence relation for the number of times pp occurs in each of the above sub-cases. The first case is simply counted by O(n2)O(n-2) and the second by O(n3).O(n-3). Let O(n)O'(n) be the number of times pp occurs in F(n)+F(n1)F(n) + F(n-1) and overlaps both, and O(n)O''(n) the number of times pp occurs in F(n1)+F(n)F(n-1)+F(n) and overlaps both. Then for n>mn > m we have

O(n)=O(n1)+O(n2)+O(n2) O(n) = O(n-1) + O(n-2) + O'(n-2)

For O,O', given that n>m,n > m, we have

O(n)=O(n1). O'(n) = O''(n-1).

Similarly, for O,O'', given that n>m,n > m, we have

O(n)=O(n1). O''(n) = O'(n-1).

This is enough information to come up with a rather efficient algorithm:

  1. First find mm as described above, by finding the first number such that p<fm3.|p|<f_{m-3}.
  2. Then find the number of occurrences of pp in F(m)=F(n2)+F(n3)+F(n2)F(m) = F(n-2) + F(n-3) + F(n-2) using an efficient pattern matching algorithm such as KMP. Classify each occurrence as one of the sub-cases above, and by doing so find the values for O(m3),O(m-3), O(n2),O(n-2), O(m2)=O(m1),O'(m-2)=O''(m-1), and O(m2)=O(m1)O''(m-2)=O'(m-1) which are going to be our base cases.
  3. Now, based on the recurrences above, and in a bottom to top approach, calculate the value of O(n)O(n) for the given n.n.

Since fm4p<fm3f_{m-4} \le |p| < f_{m-3} we have F(m)=fmϕ4p|F(m)| = f_{m} \approx \phi^4|p| and so F(m)=O(p).|F(m)| = O(|p|). So the KMP string matching part of this algorithm will take O(p)O(|p|) time and the rest will take O(n)O(n) time. Hence the total time is O(p+n)O(|p|+n) for finding a pattern of length p|p| in F(n).F(n). This is definitely reasonable since pp is much shorter compared to F(n).F(n).

Here's the Java code for the above algorithm. I use a KMP implementation that I translated over from an older Python project I did. I might write a post on KMP itself soon!

import java.util.LinkedList;
import java.util.Scanner;

public class Problem1282 {
    public static String[] F;
    public static long[] f;

    // Computes F up until F[m] such that M < F[m-3]
    public static void precomputeF(long M) {
        F = new String[101];
        F[0] = "0";
        F[1] = "1";
        for (int i = 1; i < 3 || F[i - 3].length() < M; i++) {
            F[i + 1] = F[i] + F[i - 1];
        }
    }

    public static void precomputef(int M) {
        f = new long[M + 1];
        f[0] = 1;
        f[1] = 1;

        for (int i = 1; i < M; i++) {
            f[i + 1] = f[i] + f[i - 1];
        }
    }

    public static long occurrences(int n, String p) {
        if (p.equals("0")) {
            if (n < 2) {
                return n == 0 ? 1 : 0;
            }
            return f[n - 2];
        }
        if (p.equals("1")) {
            if (n < 2) {
                return n == 0 ? 0 : 1;
            }
            return f[n - 1];
        }

        int m = 3;
        while (f[m - 3] <= p.length()) {
            m++;
        }

        if (m > n) {
            m = n;
        }

        long a = 0; // This will start with O(m-3)
        long b = 0; // And this one with O(m-2)
        long[] o = new long[2]; // Overlapping count

        // Only need to find occurrences up until index f[m-1] since past that
        // we are back to F(n-2) and since F(n) starts with F(n-2) we have
        // already found all the occurrences in F(n-2) already.
        LinkedList<Integer> occurrences = KMP.search(F[m], p, (int) f[m - 1]);
        for (int found : occurrences) {
            if (found < f[m - 2]) {
                if (found + p.length() - 1 < f[m - 2]) {
                    b++;
                } else {
                    o[0]++;
                }
            } else if (found < f[m - 1]) {
                if (found + p.length() - 1 < f[m - 1]) {
                    a++;
                } else {
                    o[1]++;
                }
            }
        }

        int i;
        for (i = 0; i <= n - m + 1; i++) {
            long old_b = b;
            b = b + a + o[i % 2];
            a = old_b;
        }

        return b;
    }

    public static void main(String[] args) {
        precomputeF(100000);
        precomputef(100);

        java.util.Scanner scanner = new Scanner(System.in);
        for (int c = 1; scanner.hasNext(); c++) {
            int n = Integer.parseInt(scanner.nextLine());
            String p = scanner.nextLine();
            System.out.format("Case %d: %d\n", c, occurrences(n, p));
        }
    }
}

class KMP {
    // Calculate the border array (i.e. failure function) of string x.
    // Direct translation of my older Python code
    private static int[] borderArray(String s) {
        char[] x = s.toCharArray();
        int n = x.length;
        int[] beta = new int[n];
        int b = 0;
        for (int i = 1; i < n; i++) {
            while (b > 0 && x[i] != x[b]) {
                b = beta[b - 1];
            }
            beta[i] = b = (x[i] == x[b] ? b + 1 : 0);
        }
        return beta;
    }

    // Return an array of indices where p occurs in x.
    // Overlapping matches are allowed. For example kmpSearch("ababa", "aba")
    // yields 0 and 2.
    public static LinkedList<Integer> search(String xx, String pp, int max_index) {
        char[] x = xx.toCharArray();
        char[] p = pp.toCharArray();
        int[] beta = borderArray(pp);
        int i = 0, m = 0;
        LinkedList<Integer> results = new LinkedList<Integer>();
        while (m + i < x.length) {
            if (m > max_index) {
                break;
            }
            if (x[m + i] == p[i]) {
                if (i == p.length - 1) {
                    results.add(m);
                } else {
                    i++;
                    continue;
                }
            }
            if (i > 0) {
                m = m + i - beta[i - 1];
                i = beta[i - 1];
            } else {
                m = m + 1;
                i = 0;
            }
        }

        return results;
    }
}

Comments