Pattern Matching in Fibonacci Words (ACM ICPC World Finals 2012)

I was looking at ACM ICPC world final problems the other day when this particular problem caught my eyes. Having spent a few hours working on the Fibonacci primitive roots problem it felt like a good choice for a problem to try to tackle. You can find the source here. It is also on the UVA online judge as problem 1282.

Fibonacci words are defined by letting \(F(0) = \mathtt{0},\) \(F(1) = \mathtt{1}\) and \(F(n)=F(n-1) + F(n-2)\) where \(+\) is the string concatenation operation. Hence \(F(n)\) is a string for \(n \ge 0.\) Specifically, these strings are referred to as Fibonacci words.

The problem requires an efficient algorithm to find the number of occurrences of a given pattern \(p\) in a given Fibonacci word \(F(n).\) Noting that the length of Fibonacci words grows exponentially, at an exponential rate given by the golden ratio \(\phi\), generating the Fibonacci word in memory and then performing a search is simply not an option for large \(n.\)

For the rest of this post, let \(f_n\) be the regular Fibonacci sequence given by \(f_0=|F(0)|=1\) and \(f_1=|F(1)=1\) and \(f_n=f_{n-1}+f{n-2}\) for \(n \ge 2.\)

Let \(O(n)\) be the number of occurrences of \(p\) in \(F(n).\)

My solution involved first noticing the following about Fibonacci words:

  1. The lengths of Fibonacci words are given by \(|F(n)|=f_{n}.\)
  2. \(F(n)\) is a prefix of \(F(n+1).\)
  3. \(F(n)\) is a suffix of \(F(n+2).\) Hence \(F(n)\) is a suffix of \(F(n+2k)\) for all \(k \ge 0.\)
  4. For \(n \ge 3\) we have \(F(n) = F(n-1) + F(n-2) = F(n-2) + F(n-3) + F(n-2).\)
  5. Similarly, \(n \ge 4\) we have \(F(n) = F(n-3) + F(n-4) + F(n-3) + F(n-3) + F(n-4).\)

Let \(m\) be the smallest number that is large enough such that \(|p| < |F(m-3)|=f_{m-3}.\) Then for \(n \ge m\) there are four exclusive possibilities for \(p\):

  1. it is not contained in \(F(n),\) or
  2. it is contained in \(F(n-2),\) or
  3. it is contained in \(F(n-3),\) or
  4. it occurs in \(F(n-2)+F(n-3)\) and overlaps both, or
  5. it occurs in \(F(n-3)+F(n-2)\) and overlaps both.

We can now find a recurrence relation for the number of times \(p\) occurs in each of the above sub-cases. The first case is simply counted by \(O(n-2)\) and the second by \(O(n-3).\) Let \(O'(n)\) be the number of times \(p\) occurs in \(F(n) + F(n-1)\) and overlaps both, and \(O''(n)\) the number of times \(p\) occurs in \(F(n-1)+F(n)\) and overlaps both. Then for \(n > m\) we have

\begin{equation*} O(n) = O(n-1) + O(n-2) + O'(n-2) \end{equation*}

For \(O',\) given that \(n > m,\) we have

\begin{equation*} O'(n) = O''(n-1). \end{equation*}

Similarly, for \(O'',\) given that \(n > m,\) we have

\begin{equation*} O''(n) = O'(n-1). \end{equation*}

This is enough information to come up with a rather efficient algorithm:

  1. First find \(m\) as described above, by finding the first number such that \(|p|<f_{m-3}.\)
  2. Then find the number of occurrences of \(p\) in \(F(m) = F(n-2) + F(n-3) + F(n-2)\) using an efficient pattern matching algorithm such as KMP. Classify each occurrence as one of the sub-cases above, and by doing so find the values for \(O(m-3),\) \(O(n-2),\) \(O'(m-2)=O''(m-1),\) and \(O''(m-2)=O'(m-1)\) which are going to be our base cases.
  3. Now, based on the recurrences above, and in a bottom to top approach, calculate the value of \(O(n)\) for the given \(n.\)

Since \(f_{m-4} \le |p| < f_{m-3}\) we have \(|F(m)| = f_{m} \approx \phi^4|p|\) and so \(|F(m)| = O(|p|).\) So the KMP string matching part of this algorithm will take \(O(|p|)\) time and the rest will take \(O(n)\) time. Hence the total time is \(O(|p|+n)\) for finding a pattern of length \(|p|\) in \(F(n).\) This is definitely reasonable since \(p\) is much shorter compared to \(F(n).\)

Here's the Java code for the above algorithm. I use a KMP implementation that I translated over from an older Python project I did. I might write a post on KMP itself soon!

import java.util.LinkedList;
import java.util.Scanner;

public class Problem1282 {
    public static String[] F;
    public static long[] f;

    // Computes F up until F[m] such that M < F[m-3]
    public static void precomputeF(long M) {
        F = new String[101];
        F[0] = "0";
        F[1] = "1";
        for (int i = 1; i < 3 || F[i - 3].length() < M; i++) {
            F[i + 1] = F[i] + F[i - 1];
        }
    }

    public static void precomputef(int M) {
        f = new long[M + 1];
        f[0] = 1;
        f[1] = 1;

        for (int i = 1; i < M; i++) {
            f[i + 1] = f[i] + f[i - 1];
        }
    }

    public static long occurrences(int n, String p) {
        if (p.equals("0")) {
            if (n < 2) {
                return n == 0 ? 1 : 0;
            }
            return f[n - 2];
        }
        if (p.equals("1")) {
            if (n < 2) {
                return n == 0 ? 0 : 1;
            }
            return f[n - 1];
        }

        int m = 3;
        while (f[m - 3] <= p.length()) {
            m++;
        }

        if (m > n) {
            m = n;
        }

        long a = 0; // This will start with O(m-3)
        long b = 0; // And this one with O(m-2)
        long[] o = new long[2]; // Overlapping count

        // Only need to find occurrences up until index f[m-1] since past that
        // we are back to F(n-2) and since F(n) starts with F(n-2) we have
        // already found all the occurrences in F(n-2) already.
        LinkedList<Integer> occurrences = KMP.search(F[m], p, (int) f[m - 1]);
        for (int found : occurrences) {
            if (found < f[m - 2]) {
                if (found + p.length() - 1 < f[m - 2]) {
                    b++;
                } else {
                    o[0]++;
                }
            } else if (found < f[m - 1]) {
                if (found + p.length() - 1 < f[m - 1]) {
                    a++;
                } else {
                    o[1]++;
                }
            }
        }

        int i;
        for (i = 0; i <= n - m + 1; i++) {
            long old_b = b;
            b = b + a + o[i % 2];
            a = old_b;
        }

        return b;
    }

    public static void main(String[] args) {
        precomputeF(100000);
        precomputef(100);

        java.util.Scanner scanner = new Scanner(System.in);
        for (int c = 1; scanner.hasNext(); c++) {
            int n = Integer.parseInt(scanner.nextLine());
            String p = scanner.nextLine();
            System.out.format("Case %d: %d\n", c, occurrences(n, p));
        }
    }
}

class KMP {
    // Calculate the border array (i.e. failure function) of string x.
    // Direct translation of my older Python code
    private static int[] borderArray(String s) {
        char[] x = s.toCharArray();
        int n = x.length;
        int[] beta = new int[n];
        int b = 0;
        for (int i = 1; i < n; i++) {
            while (b > 0 && x[i] != x[b]) {
                b = beta[b - 1];
            }
            beta[i] = b = (x[i] == x[b] ? b + 1 : 0);
        }
        return beta;
    }

    // Return an array of indices where p occurs in x.
    // Overlapping matches are allowed. For example kmpSearch("ababa", "aba")
    // yields 0 and 2.
    public static LinkedList<Integer> search(String xx, String pp, int max_index) {
        char[] x = xx.toCharArray();
        char[] p = pp.toCharArray();
        int[] beta = borderArray(pp);
        int i = 0, m = 0;
        LinkedList<Integer> results = new LinkedList<Integer>();
        while (m + i < x.length) {
            if (m > max_index) {
                break;
            }
            if (x[m + i] == p[i]) {
                if (i == p.length - 1) {
                    results.add(m);
                } else {
                    i++;
                    continue;
                }
            }
            if (i > 0) {
                m = m + i - beta[i - 1];
                i = beta[i - 1];
            } else {
                m = m + 1;
                i = 0;
            }
        }

        return results;
    }
}

Comments