# Expected Running Time of Two Randomized Sort Algorithms (ACM ICPC 2013 PACNW Regional)

The last month was a very busy one, with the ACM ICPC regionals taking place last week. This post is based on a problem on the competition, "Crusher's Code". You can see the full list of problem statements here.

First, a brief explanation of the problem:

Two sort algorithms are given. Both take an input array $A$ of size $n$ as input. Algorithm Monty chooses random numbers $i$ and $j$ with $0 \le i, j < n,$ swapping them if necessary to ensure $i < j$ and then swaps $A[i]$ and $A[j]$ if $A[i] > A[j].$ Algorithm Carlos on the other hand picks a random $i$ with $0 \le i < n -1$ and swaps $A[i]$ and $A[i+1]$ if $A[i] > A[i+1].$ Both algorithms continue until the array is sorted. How many iterations are each expected to run for before terminating?

My solution to this problem started with coming up with a recurrence for the expected number of iterations in each case. For Monty we reduce the problem to a "smaller" problem as soon as we randomly come across a a pair $(i, j)$ that form an inversion of the array. Let's say the set of inversions of $A$ is denoted by $I(A).$ Then choosing a pair $(i, j)$ at random has a total of $n^2$ possibilities, of which exactly $2|I(A)|$ result in an inversion. Hence this can be seen as a Bernoulli process with success rate $p = \frac{2|I(A)|}{n^2}.$ The expected number of trials before a success is therefore simply $\frac{n^2}{2|I(A)|}.$

This gives us the following recursion for the expected number of iterations of Monty:

$E_M(A) = \frac{n^2}{2|I(A)|} + \sum_{(i, j) \in I(A)} \frac{1}{|I(A)|} E_M(A_{i,j})$

where by $A_{i,j}$ here I mean the array $A$ with its $i$ and $j$ indices swapped. The $\frac{1}{|I(A)}$ is the probability that any particular inversion is the one that we get to first during the sort.

Following a similar line of thought, for Carlos we can let $I'(A)$ denote the set of indices $i$ such that $A[i] > A[i+1].$ Then we get the following recurrence for the expected number of iterations of Carlos:

$E_C(A) = \frac{n - 1}{|I'(A)|} + \sum_{i \in I'(A)} \frac{1}{|I'(A)|} E_C(A_{i,i+1}).$

Of course, there are a lot of overlaps in the recurrence, so memoizing the values are key to a fast enough algorithm. During the competition I achieved this in Java by first getting the inversion sequence of the array, and the representing the inversion sequence as a base $n$ number, and then using this representation as a key for a hash-table to be used as a cache.

In Python, this can be done much simpler by just hashing with the array itself as the key, converted to a tuple. Here is an implementation in Python for the above algorithm. Note that $I(A)$ and $I'(A)$ are implemented as separate functions and passed to the main driver as parameters. The "cost" functions are also passed as parameters similarly, allowing for a simple driver function to calculate both numbers.

from __future__ import division
from sys import stdin

def expected_iterations(A, cache, candidates, cost):
n = len(A)
h = tuple(A)
if h in cache:
return cache[h]
E = 0.0
C = candidates(A)
if C:
for (i, j) in C:
A[i], A[j] = A[j], A[i]
E += expected_iterations(A, cache, candidates, cost)
A[i], A[j] = A[j], A[i]
E += cost(n)
E /= len(C)
cache[h] = E
return E

def inversions(A):
return [(i, j)
for i in xrange(len(A) - 1)
for j in xrange(i + 1, len(A))
if A[i] > A[j]]

return [(i, i + 1)
for i in xrange(len(A) - 1)
if A[i] > A[i + 1]]

monty_cost = lambda n: n * n / 2.0
carlos_cost = lambda n: n - 1

def process_test_case(A):
carlos_cache = {}
monty_cache = {}
m = expected_iterations(A, monty_cache, inversions, monty_cost)
c = expected_iterations(A, carlos_cache, adj_inversions, carlos_cost)
print 'Monty {:.6f} Carlos {:.6f}'.format(m, c)

def main():