Expected Running Time of Two Randomized Sort Algorithms (ACM ICPC 2013 PACNW Regional)

The last month was a very busy one, with the ACM ICPC regionals taking place last week. This post is based on a problem on the competition, "Crusher's Code". You can see the full list of problem statements here.

First, a brief explanation of the problem:

Two sort algorithms are given. Both take an input array $A$ of size $n$ as input. Algorithm Monty chooses random numbers $i$ and $j$ with $0 \le i, j < n,$ swapping them if necessary to ensure $i < j$ and then swaps $A[i]$ and $A[j]$ if $A[i] > A[j].$ Algorithm Carlos on the other hand picks a random $i$ with $0 \le i < n -1$ and swaps $A[i]$ and $A[i+1]$ if $A[i] > A[i+1].$ Both algorithms continue until the array is sorted. How many iterations are each expected to run for before terminating?

My solution to this problem started with coming up with a recurrence for the expected number of iterations in each case. For Monty we reduce the problem to a "smaller" problem as soon as we randomly come across a a pair $(i, j)$ that form an inversion of the array. Let's say the set of inversions of $A$ is denoted by $I(A).$ Then choosing a pair $(i, j)$ at random has a total of $n^2$ possibilities, of which exactly $2|I(A)|$ result in an inversion. Hence this can be seen as a Bernoulli process with success rate $p = \frac{2|I(A)|}{n^2}.$ The expected number of trials before a success is therefore simply $\frac{n^2}{2|I(A)|}.$

This gives us the following recursion for the expected number of iterations of Monty:

E_M(A) = \frac{n^2}{2|I(A)|} + \sum_{(i, j) \in I(A)} \frac{1}{|I(A)|} E_M(A_{i,j})

where by $A_{i,j}$ here I mean the array $A$ with its $i$ and $j$ indices swapped. The $\frac{1}{|I(A)}$ is the probability that any particular inversion is the one that we get to first during the sort.

Following a similar line of thought, for Carlos we can let $I'(A)$ denote the set of indices $i$ such that $A[i] > A[i+1].$ Then we get the following recurrence for the expected number of iterations of Carlos:

E_C(A) = \frac{n - 1}{|I'(A)|} + \sum_{i \in I'(A)} \frac{1}{|I'(A)|} E_C(A_{i,i+1}).

Of course, there are a lot of overlaps in the recurrence, so memoizing the values are key to a fast enough algorithm. During the competition I achieved this in Java by first getting the inversion sequence of the array, and the representing the inversion sequence as a base $n$ number, and then using this representation as a key for a hash-table to be used as a cache.

In Python, this can be done much simpler by just hashing with the array itself as the key, converted to a tuple. Here is an implementation in Python for the above algorithm. Note that $I(A)$ and $I'(A)$ are implemented as separate functions and passed to the main driver as parameters. The "cost" functions are also passed as parameters similarly, allowing for a simple driver function to calculate both numbers.

from __future__ import division
from sys import stdin


def expected_iterations(A, cache, candidates, cost):
    n = len(A)
    h = tuple(A)
    if h in cache:
        return cache[h]
    E = 0.0
    C = candidates(A)
    if C:
        for (i, j) in C:
            A[i], A[j] = A[j], A[i]
            E += expected_iterations(A, cache, candidates, cost)
            A[i], A[j] = A[j], A[i]
        E += cost(n)
        E /= len(C)
    cache[h] = E
    return E


def inversions(A):
    return [(i, j)
            for i in xrange(len(A) - 1)
            for j in xrange(i + 1, len(A))
            if A[i] > A[j]]


def adj_inversions(A):
    return [(i, i + 1)
            for i in xrange(len(A) - 1)
            if A[i] > A[i + 1]]


monty_cost = lambda n: n * n / 2.0
carlos_cost = lambda n: n - 1


def process_test_case(A):
    carlos_cache = {}
    monty_cache = {}
    m = expected_iterations(A, monty_cache, inversions, monty_cost)
    c = expected_iterations(A, carlos_cache, adj_inversions, carlos_cost)
    print 'Monty {:.6f} Carlos {:.6f}'.format(m, c)


def main():
    T = int(stdin.readline())
    for __ in xrange(T):
        A = [int(x) for x in stdin.readline().split(' ')]
        # ignore the first element since is is length of the array
        process_test_case(A[1:])


if __name__ == '__main__':
    main()

Math ∪ Code

by Sahand Saba

Expected Running Time of Two Randomized Sort Algorithms (ACM ICPC 2013 PACNW Regional)

Comments