# Pattern Matching in Fibonacci Words (ACM ICPC World Finals 2012)

I was looking at ACM ICPC world final problems the other day when this particular problem caught my eyes. Having spent a few hours working on the Fibonacci primitive roots problem it felt like a good choice for a problem to try to tackle. You can find the source here. It is also on the UVA online judge as problem 1282.

Fibonacci words are defined by letting $F(0) = \mathtt{0},$ $F(1) = \mathtt{1}$ and $F(n)=F(n-1) + F(n-2)$ where $+$ is the string concatenation operation. Hence $F(n)$ is a string for $n \ge 0.$ Specifically, these strings are referred to as Fibonacci words.

The problem requires an efficient algorithm to find the number of occurrences of a given pattern $p$ in a given Fibonacci word $F(n).$ Noting that the length of Fibonacci words grows exponentially, at an exponential rate given by the golden ratio $\phi$ , generating the Fibonacci word in memory and then performing a search is simply not an option for large $n.$

For the rest of this post, let $f_n$ be the regular Fibonacci sequence given by $f_0=|F(0)|=1$ and $f_1=|F(1)=1$ and $f_n=f_{n-1}+f{n-2}$ for $n \ge 2.$

Let $O(n)$ be the number of occurrences of $p$ in $F(n).$

My solution involved first noticing the following about Fibonacci words:

1. The lengths of Fibonacci words are given by $|F(n)|=f_{n}.$
2. $F(n)$ is a prefix of $F(n+1).$
3. $F(n)$ is a suffix of $F(n+2).$ Hence $F(n)$ is a suffix of $F(n+2k)$ for all $k \ge 0.$
4. For $n \ge 3$ we have $F(n) = F(n-1) + F(n-2) = F(n-2) + F(n-3) + F(n-2).$
5. Similarly, $n \ge 4$ we have $F(n) = F(n-3) + F(n-4) + F(n-3) + F(n-3) + F(n-4).$

Let $m$ be the smallest number that is large enough such that $|p| < |F(m-3)|=f_{m-3}.$ Then for $n \ge m$ there are four exclusive possibilities for $p$ :

1. it is not contained in $F(n),$ or
2. it is contained in $F(n-2),$ or
3. it is contained in $F(n-3),$ or
4. it occurs in $F(n-2)+F(n-3)$ and overlaps both, or
5. it occurs in $F(n-3)+F(n-2)$ and overlaps both.

We can now find a recurrence relation for the number of times $p$ occurs in each of the above sub-cases. The first case is simply counted by $O(n-2)$ and the second by $O(n-3).$ Let $O'(n)$ be the number of times $p$ occurs in $F(n) + F(n-1)$ and overlaps both, and $O''(n)$ the number of times $p$ occurs in $F(n-1)+F(n)$ and overlaps both. Then for $n > m$ we have

$O(n) = O(n-1) + O(n-2) + O'(n-2)$

For $O',$ given that $n > m,$ we have

$O'(n) = O''(n-1).$

Similarly, for $O'',$ given that $n > m,$ we have

$O''(n) = O'(n-1).$

This is enough information to come up with a rather efficient algorithm:

1. First find $m$ as described above, by finding the first number such that $|p|
2. Then find the number of occurrences of $p$ in $F(m) = F(n-2) + F(n-3) + F(n-2)$ using an efficient pattern matching algorithm such as KMP. Classify each occurrence as one of the sub-cases above, and by doing so find the values for $O(m-3),$ $O(n-2),$ $O'(m-2)=O''(m-1),$ and $O''(m-2)=O'(m-1)$ which are going to be our base cases.
3. Now, based on the recurrences above, and in a bottom to top approach, calculate the value of $O(n)$ for the given $n.$

Since $f_{m-4} \le |p| < f_{m-3}$ we have $|F(m)| = f_{m} \approx \phi^4|p|$ and so $|F(m)| = O(|p|).$ So the KMP string matching part of this algorithm will take $O(|p|)$ time and the rest will take $O(n)$ time. Hence the total time is $O(|p|+n)$ for finding a pattern of length $|p|$ in $F(n).$ This is definitely reasonable since $p$ is much shorter compared to $F(n).$

Here's the Java code for the above algorithm. I use a KMP implementation that I translated over from an older Python project I did. I might write a post on KMP itself soon!

import java.util.LinkedList;
import java.util.Scanner;

public class Problem1282 {
public static String[] F;
public static long[] f;

// Computes F up until F[m] such that M < F[m-3]
public static void precomputeF(long M) {
F = new String;
F = "0";
F = "1";
for (int i = 1; i < 3 || F[i - 3].length() < M; i++) {
F[i + 1] = F[i] + F[i - 1];
}
}

public static void precomputef(int M) {
f = new long[M + 1];
f = 1;
f = 1;

for (int i = 1; i < M; i++) {
f[i + 1] = f[i] + f[i - 1];
}
}

public static long occurrences(int n, String p) {
if (p.equals("0")) {
if (n < 2) {
return n == 0 ? 1 : 0;
}
return f[n - 2];
}
if (p.equals("1")) {
if (n < 2) {
return n == 0 ? 0 : 1;
}
return f[n - 1];
}

int m = 3;
while (f[m - 3] <= p.length()) {
m++;
}

if (m > n) {
m = n;
}

long b = 0; // And this one with O(m-2)
long[] o = new long; // Overlapping count

// Only need to find occurrences up until index f[m-1] since past that
// we are back to F(n-2) and since F(n) starts with F(n-2) we have
LinkedList<Integer> occurrences = KMP.search(F[m], p, (int) f[m - 1]);
for (int found : occurrences) {
if (found < f[m - 2]) {
if (found + p.length() - 1 < f[m - 2]) {
b++;
} else {
o++;
}
} else if (found < f[m - 1]) {
if (found + p.length() - 1 < f[m - 1]) {
a++;
} else {
o++;
}
}
}

int i;
for (i = 0; i <= n - m + 1; i++) {
long old_b = b;
b = b + a + o[i % 2];
a = old_b;
}

return b;
}

public static void main(String[] args) {
precomputeF(100000);
precomputef(100);

java.util.Scanner scanner = new Scanner(System.in);
for (int c = 1; scanner.hasNext(); c++) {
int n = Integer.parseInt(scanner.nextLine());
String p = scanner.nextLine();
System.out.format("Case %d: %d\n", c, occurrences(n, p));
}
}
}

class KMP {
// Calculate the border array (i.e. failure function) of string x.
// Direct translation of my older Python code
private static int[] borderArray(String s) {
char[] x = s.toCharArray();
int n = x.length;
int[] beta = new int[n];
int b = 0;
for (int i = 1; i < n; i++) {
while (b > 0 && x[i] != x[b]) {
b = beta[b - 1];
}
beta[i] = b = (x[i] == x[b] ? b + 1 : 0);
}
return beta;
}

// Return an array of indices where p occurs in x.
// Overlapping matches are allowed. For example kmpSearch("ababa", "aba")
// yields 0 and 2.
public static LinkedList<Integer> search(String xx, String pp, int max_index) {
char[] x = xx.toCharArray();
char[] p = pp.toCharArray();
int[] beta = borderArray(pp);
int i = 0, m = 0;
while (m + i < x.length) {
if (m > max_index) {
break;
}
if (x[m + i] == p[i]) {
if (i == p.length - 1) {
} else {
i++;
continue;
}
}
if (i > 0) {
m = m + i - beta[i - 1];
i = beta[i - 1];
} else {
m = m + 1;
i = 0;
}
}

return results;
}
}