Basics of Cryptography Part I: RSA Encryption and Decryption

Cryptography and computer security have always been side interests for me. Recently, while I was reading about RSA encryption, I thought about how well I understood the algorithm. For me, the true test of understanding of something is always whether I can code it from scratch. While this goes against the "don't reinvent the wheel" principle of software development—a principle which I hold very dear when it comes to professional work, reinventing the wheel can be a great way to explore and learn more.

What follows is a basic walk-through of RSA encryption, accompanied by a simple (but relatively efficient) Python implementation of it.

Some basic group theory and number theory knowledge is needed. I provide some of the basics of the required group theory in the appendix. Most of it is there for the mathematically minded, and can be skipped by anyone not too interested in the mathematical proofs.

Disclaimer: None of the code in this article are to be used in a real application. There are many many considerations that go into creation of production-ready cryptographic software that are not considered here. The purpose of the code in this article is to build a bare-minimum implementation of basic, deterministic, RSA encryption and decryption. This article is for educational purposes only.

An Overview of the RSA Cryptosystem

The RSA cryptosystem is an asymmetric cryptosystem that relies on the following intuitive observations, where by easy we mean asymptotically fast algorithms for the mentioned problem exist, and by difficult we mean no asymptotically fast algorithm is known:

  1. It is easy to find two distinct large primes pp and q.q.
  2. It is easy to multiply two large primes together, calculating n=pq.n=pq.
  3. However, given n=pq,n=pq, it is difficult to factor out pp and q.q.
  4. Given a,na, n and ee , with 0<a<n0<a<n and e>1,e > 1, calculating ae(modn)a^e \pmod{n} is easy.
  5. But inversely, given ae(modn),a^e \pmod{n}, ee and n,n, it is difficult to calculate a.a.
  6. However, if the multiplicative inverse of ee modulo ϕ(n)=(q1)(p1),\phi(n)=(q-1)(p-1), labelled d,d, is given, then the previous calculation is easy. (The multiplicative inverse satisfies ed=1(modϕ(n))e \cdot d = 1 \pmod{\phi(n)} .)
  7. And finally, calculating dd from only nn and ee is difficult, but easy if the factorization of n=pqn=pq is known.

Putting all of this together, we get a scheme for an asymmetric cryptosystem as follows.

Key Generation: Generate distinct primes pp and qq and let n=pq.n=pq. Also let m=ϕ(n)=(p1)(q1)m=\phi(n)=(p-1)(q-1) and pick any 1<e<m.1<e<m. Calculate dd as the multiplicative inverse of ee modulo mm using the extended Euclidean algorithm. The private key is then (n,d)(n, d) and the public key is (n,e).(n, e).

Encryption: To encrypt plaintext, first encode the plaintext (by padding it for example) so that it consists of blocks of size less than n.n. Then for block a,a, define E(a)=ae(modn)E(a)=a^e \pmod{n} as the encryption function.

Decryption: To decrypt ciphertext given by c=E(a),c = E(a), define D(c)=cd(modn).D(c) = c^d \pmod{n}. We then have D(c)=D(E(a))=aed(modn).D(c)=D(E(a))=a^{ed} \pmod{n}. For this to work, we need aed=a(modn).a^{ed}=a \pmod{n}. The following theorem then shows that the encryption and decryption algorithms satisfy D(c)=D(E(a))=aed=a(modn).D(c)=D(E(a))=a^{ed}=a \pmod{n}.

To prove correctness the main key is to show that am=a(modn).a^m = a \pmod{n}.

Theorem

Let p,q,n,m,ep,q,n,m,e and dd be given as above. Then for any 0<a<n,0<a<n, aed=a(modn).a^{ed} = a \pmod{n}. Proof First, assume that gcd(a,n)=1.\gcd(a, n) = 1. This means that aa is in Zn,Z^*_n, that is, the multiplicative group of nonzero integers modulo n.n. Since Zn=m=ϕ(n),|Z^*_n| = m = \phi(n), we have am=1(modn)a^m = 1 \pmod{n} as for any group G,G, and any element aGa\in{G} we have aG=eGa^{|G|} = e_G and eG=1e_G=1 in this case. (For a proof of this, see the appendix on group theory basics.) Since ed1(modm)ed \equiv 1 \pmod{m} there exists some tt such that ed=tm+1.ed = tm + 1. This gives aed=atm+1=amta=(am)taa(modn).a^{ed}=a^{tm+1}= a^{mt}a=(a^m)^ta \equiv a \pmod{n}.

The case where gcd(a,n)̸=1\gcd(a, n) \not= 1 means either pp divides aa or qq does, but not both since then a=pq=na=pq=n which can not be since we picked 0<a<n.0<a<n. Let's assume pp does, and let a=hp,a=hp, with gcd(q,h)=1.\gcd(q, h) = 1. Then aedpeqheq0pha(modp).a^{ed} \equiv p^{eq}h^{eq} \equiv 0 \equiv ph \equiv a \pmod{p}. Also, aedpedhed1pha(modq)a^{ed} \equiv p^{ed}h^{ed} \equiv 1 \equiv ph \equiv a \pmod{q} because qq does not divide either pp or h.h. So we have that aeda^{ed} and aa have the same remainder modulo both pp and q.q. By the Chinese remainder theorem, we get the two are then equivalent modulo n=pq.n=pq. The case for qq dividing aa is symmetric. \blacksquare

The Fun Part: Implementation and Practicalities

Let's put the pieces the puzzle together now and implement a simple RSA cryptosystem in Python. As we will see, each piece of the implementation here ends up being related to interesting and well-studied areas of mathematics and computer science.

Key Generation and Primality Tests

To start, our key generation requires us to generate two primes pp and q.q. We note above that if n=pqn=pq is small enough then an attacker can simply factor nn to get the two primes, use them to calculate ϕ(n)\phi(n) and from there calculate d=e1.d=e^{-1}. Hence we need nn and therefore pp and qq to be large.

To generate a large prime p,p, we start by deciding the number of bits we require in the prime. This number is one of the key parameters that determines the strength of our cryptosystem. Let's call this number bb and let's assume for simplicity that b>8b>8 for all our intended purposes. In practice, bb is usually between 512512 and 2048.2048. As I typed this, I decided to check Facebook and GMail to see what key strength they use in their secure TLS connections. Let's look at Facebook first:

OpenSSL> s_client -connect facebook.com:443
CONNECTED(00000003)
[...]
New, TLSv1/SSLv3, Cipher is AES128-GCM-SHA256
Server public key is 2048 bit

So 20482048 bit keys are used here. This means each prime is 10241024 bits long. And GMail:

OpenSSL> s_client -connect mail.google.com:443
CONNECTED(00000003)
[...]
New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES128-GCM-SHA256
Server public key is 1024 bit

So GMail is using 10481048 bit keys, meaning each prime is 512512 bits long.

To generate a bbitb-\text{bit} prime pp we start by generating bb bits of random data, and then manually making sure both the leading and trailing bits are ones to ensure the number is of size greater than 2b12^{b-1} (we don't want to generate, say, 33 by chance, no matter how small the probability) and that it is an odd number. (Side note on entropy of the primes generated: forcing these bits to be 11 here reduces the true entropy of the primes, more on this in a bit.) Let tt denote the number given by the manipulated random bits as described above. Then we proceed to check t+it+i for primality starting from i=0i=0 and onwards. As soon as a prime is reached, we let pp be equal to it. It is generally a good idea to give up on an interval after a certain number of attempts. To do this effectively, we need to consider the Prime Number Theorem which, letting π(n)\pi(n) denote the number of primes less than or equal to n,n, tells us:

limn=1π(x)nln(n)=1. \lim_{n=1}^{\infty}\frac{\pi(x)}{\frac{n}{\ln(n)}}=1.

In other words, there are roughly nln(n)\frac{n}{\ln(n)} primes smaller than or equal to nn and the larger nn gets, the better this estimate becomes. This translates directly to the heuristic that around 2b,2^b, for b>8b>8 or so, a prime appears in about every bb consecutive numbers. However, this is only a heuristic, and large gaps in prime numbers exist. To avoid being trapped in one of these large gaps, we decide to give up on tt after say, 2b2b numbers consecutive to it have been tested and failed. In such a case, we simply generate bb new random bits and repeat the process all over. The end result ends up looking like this:

def generate_random_prime(bits, primality_test):
    """Generate random prime number with n bits."""
    get_random_t = lambda: random.getrandbits(bits) | 1 << bits | 1
    p = get_random_t()
    for i in itertools.count(1):
        if primality_test(p):
            return p
        else:
            if i % (bits * 2) == 0:
                p = get_random_t()
            else:
                p += 2  # Add 2 since we are only interested in odd numbers

This function gets in an infinite loop if there are no bbitb-\text{bit} prime numbers. This can only happen with small b,b, so we ignore this possible source of bug at this point in favour of an efficient implementation.

Next, we notice in the above that the primality test is the very core of the algorithm. Let's first start by trying a very simple primality test and seeing its performance. To measure the performance, the logged decorator from the Python decorators post is used. First the primality test:

@logged("%b %d %Y - %H:%M:%S")
def simple_is_prime(n):
    """Returns True if n is a prime. False otherwise."""
    if n % 2 == 0:
        return n == 2
    if n % 3 == 0:
        return n == 3
    k = 6L
    while (k - 1) ** 2 <= n:
        if n % (k - 1) == 0 or n % (k + 1) == 0:
            return False
        k += 6
    return True

This is the somewhat more evolved version of the algorithm that checks nn against divisibility by all numbers smaller than or equal to n.\lfloor \sqrt{n}\rfloor. Our version first tests 22 and 33 and afterwards against numbers of the form k=6i±1k=6i\pm{}1 since these are the only numbers not divisible by 22 and 3.3. Let's look at the efficiency:

>>> for b in [8, 16, 32, 44, 45, 46]:
...     print "Generating %d-bit prime: " % b
...     print generate_random_prime(b, simple_is_prime)
...
Generating 8-bit prime:
- Running 'generate_random_prime' on Aug 21 2013 - 13:53:42
- Finished 'generate_random_prime', execution time = 0.000s
337
Generating 16-bit prime:
- Running 'generate_random_prime' on Aug 21 2013 - 13:53:42
- Finished 'generate_random_prime', execution time = 0.000s
129737
Generating 32-bit prime:
- Running 'generate_random_prime' on Aug 21 2013 - 13:53:42
- Finished 'generate_random_prime', execution time = 0.027s
6088203911
Generating 44-bit prime:
- Running 'generate_random_prime' on Aug 21 2013 - 13:53:42
- Finished 'generate_random_prime', execution time = 0.876s
34009683800003
Generating 45-bit prime:
- Running 'generate_random_prime' on Aug 21 2013 - 13:53:43
- Finished 'generate_random_prime', execution time = 1.139s
44586027757781
Generating 46-bit prime:
- Running 'generate_random_prime' on Aug 21 2013 - 13:53:44
- Finished 'generate_random_prime', execution time = 2.215s
87012671597869

On my computer, around b=42b=42 is when the time required to generate a prime becomes noticeable. Of course, one run is not a very accurate measurement at all because of the randomness in the generation algorithm. However, we can easily see that while the algorithm above is of order O(n)O(\sqrt{n}) in n,n, it is of order O(2b)O(2^b) in b.b. In other words, it is exponential in b.b.

Fortunately, the problem of determining whether a number is prime or not is in P.P. This means there is a deterministic polynomial time (polynomial in bb and not nn as shown above) algorithm for it. The first deterministic algorithm to accomplish this is called AKS which is named after the authors of the seminal paper "PRIMES is in P" by Agrawal, Kayal, and Saxena. You can read the original paper here.

However, a practical implementation of this algorithm ends up being quite inefficient, though still better than our simple algorithm above for large numbers. One of the better algorithms is Rabin-Miller which is also polynomial time in b,b, but unlike AKS, Rabin-Miller is not deterministic. This means that the algorithm makes use of random numbers, and there is a chance it might not return the correct result at the end. With the proper choice of parameters, Rabin-Miller can reduce the probability of a false positive to astronomically small though, while still being quite efficient.

The basic idea of Rabin-Miller algorithm is to notice that if nn is prime then for any 0<x<p0<x<p we have xn11(modn),x^{n-1}\equiv 1 \pmod{n}, as we already saw. Let n1=2smn-1=2^sm where mm is odd. Now consider the finite sequence of s+1s+1 numbers

xi=x2immodn for 0is. x_i = x^{2^im} \mod n \quad \text{ for } 0 \le i \le s.

Notice that the last number in the sequence xs=x2sm=xn1=1modn.x_s=x^{2^sm}=x^{n-1}=1 \mod n. . The sequence was defined based on the idea that each number is the previous number, squared, modulo pp . That is, xi+1=xi2modn.x_{i+1}=x_i^2 \mod n. Because of this, we notice that if x0=1x_0=1 or x0=1x_0=-1 then the rest of xi=1x_i=1 for i>0i>0 and there is not much information we can gain from this. Suppose that is not the case. Then 22 divides the multiplicative order of xx (because x2is1(modp)x^{2^is} \equiv 1 \pmod{p} for some i>0i>0 ) which means that 1-1 is in the multiplicative subgroup generated by x.x. This is given by the fact that 1-1 is the unique element in Zp\mathbb{Z}_p with order 22 (because x21x^2-1 can not have any more than two roots) and Cauchy's theorem. Hence we must have xi=1x_i=-1 for some i.i. If no such ii exists, then pp is certainly not a prime. This is the main condition used in Rabin-Miller.

If xx gives us a sequence xix_i that does not satisfy the above requirement, then xx is said to be a witness to n’sn\text{'s} compositeness, because we can use xx to prove nn is composite. That is, xx testifies that nn is a composite, hence the choice of word for it.

In algorithm form we get the following.

Rabin-Miller Algorithm
Given nn first calculate ss and mm such that n1=2smn-1= 2^sm and mm is odd. Then pick a random 0<x<p0 < x < p and calculate x0=xmmodn.x_0=x^m \mod n. If x0x_0 is either 11 or 1-1 then xx is not a witness. Try another x.x. Otherwise, calculate x1=x02modn,x_1=x_0^2 \mod n, and x2=x12modnx_2=x_1^2 \mod n up to xs=xs12modnx_s=x_{s-1}^2 \mod n by squaring the previous term modulo n.n. If 11 is encountered before 1-1 then xx is a witness and nn is a composite. Otherwise if xs=1x_s=1 and - is encountered before then xx is not a witness. Try another x.x. Once "enough" xx 's have been tried without any witnesses being found, the we can conclude that pp is "most likely" a prime.

Why is it that at the end we can't conclude with certainty that pp is a prime? Suppose we pick an xx and the generated sequence xix_i satisfies the requirements. Does that mean nn is a prime? Answer is no, because for some xx composites will also satisfy the requirement above too. However, if we pick many possible witnesses and try them all, and they all fail to testify to nn then we have more and more evidence toward nn being prime. If we check all such xx then we can be certain, but then the algorithm would be exponential in bb again which is what we are avoiding. However, if we could have some idea of the probability of xx being a witness for a composite nn then we can come up with a reasonable number of random attempts at witnesses before we can say that nn is almost certainly a prime.

And this is the key part of Rabin-Miller's paper, where they prove for a given composite n,n, at least 3/4^3/_4 of numbers 0<x<n0<x<n are witnesses to nn being a composite. (The proof for this claim is rather involved, but have a look at the original paper if you are feeling brave!) This means that if we pick kk random possible witnesses given that nn is composite, we will have a probability of (14)k(\frac{1}{4})^k of none of them being a witness. So picking a kk value of say 5050 will give us a probability of (12)100(\frac{1}{2})^{100} of outputting a false positive, that is, outputting PRIME when the number is a composite. This is good enough for our purposes here.

Also, in practice, one can make the Rabin-Miller algorithm much more efficient by implementing a basic test of divisibility by the first, say, 100100 primes before moving on to Rabin-Miller.

Let's look at a Python implementation below.

def rabin_miller_is_prime(n, k=20):
    """
    Test n for primality using Rabin-Miller algorithm, with k
    random witness attempts. False return means n is certainly a composite.
    True return value indicates n is *probably* a prime. False positive
    probability is reduced exponentially the larger k gets.
    """
    b = basic_is_prime(n, K=100)
    if b is not None:
        return b

    m = n - 1
    s = 0
    while m % 2 == 0:
        s += 1
        m //= 2
    liars = set()
    get_new_x = lambda: random.randint(2, n - 1)
    while len(liars) < k:
        x = get_new_x()
        while x in liars:
            x = get_new_x()
        xi = pow(x, m, n)
        witness = True
        if xi == 1 or xi == n - 1:
            witness = False
        else:
            for __ in xrange(s - 1):
                xi = (xi ** 2) % n
                if xi == 1:
                    return False
                elif xi == n - 1:
                    witness = False
                    break
            xi = (xi ** 2) % n
            if xi != 1:
                return False
        if witness:
            return False
        else:
            liars.add(x)
    return True

In the code above, basic_is_prime simply checks the number against divisibility by the first K numbers. Here's the source for it, which relies on a list primes_list.less_than_hundred_thousand to have been set already.

def basic_is_prime(n, K=-1):
    """Returns True if n is a prime, and False it is a composite
    by trying to divide it by two and first K odd primes. Returns
    None if test is inconclusive."""
    if n % 2 == 0:
        return n == 2
    for p in primes_list.less_than_hundred_thousand[:K]:
        if n % p == 0:
            return n == p
    return None

Another part of the algorithm above that we get for free in Python but is worth studying a bit further is the pow(x, m, n) function which calculates xm(modn)x^m \pmod{n} efficiently. How? The naive algorithm would simply multiply xx by itself mm times, but Python's powpow function is smarter than that and uses a method known as "exponentiation by squaring", which runs in O(logm)O(\log m) time. The basic idea is simple:

xm={(xm2)2 if m is evenx(xm12)2 if m is odd. x^m= \begin{cases} (x^{\frac{m}{2}})^2 & \text{ if } m \text{ is even} \\ x \cdot (x^{\frac{m-1}{2}})^2 & \text{ if } m \text{ is odd}. \end{cases}

Which leads to a divide and conquer algorithm. This is what a simple iterative version of this algorithm would look like in Python. However, use the built-in pow(x, m, n) function in Python instead of this, since it's going to be faster.

def power(x, m, n):
    """Calculate x^m modulo n using O(log(m)) operations."""
    a = 1
    while m > 0:
        if m % 2 == 1:
            a = (a * x) % n
        x = (x * x) % n
        m //= 2
    return a

Now, let's try generating some primes using this test instead:

>>> for b in [2 ** i for i in xrange(3, 12)]:
...    print "Generating %d-bit prime: " % b
...    print generate_random_prime(b, rabin_miller_is_prime)
...
Generating 8-bit prime:
- Running 'generate_random_prime' on Aug 28 2013 - 20:32:32
- Finished 'generate_random_prime', execution time = 0.000s
503
Generating 16-bit prime:
- Running 'generate_random_prime' on Aug 28 2013 - 20:32:32
- Finished 'generate_random_prime', execution time = 0.001s
129757
Generating 32-bit prime:
- Running 'generate_random_prime' on Aug 28 2013 - 20:32:32
- Finished 'generate_random_prime', execution time = 0.002s
5483745739
Generating 64-bit prime:
- Running 'generate_random_prime' on Aug 28 2013 - 20:32:32
- Finished 'generate_random_prime', execution time = 0.003s
20207866191915625427
Generating 128-bit prime:
- Running 'generate_random_prime' on Aug 28 2013 - 20:32:32
- Finished 'generate_random_prime', execution time = 0.008s
378992665877954075225130181634050346399
Generating 256-bit prime:
- Running 'generate_random_prime' on Aug 28 2013 - 20:32:32
- Finished 'generate_random_prime', execution time = 0.013s
169109964058021108360022059334779755034470159052847134197148573143017688247123
Generating 512-bit prime:
- Running 'generate_random_prime' on Aug 28 2013 - 20:32:32
- Finished 'generate_random_prime', execution time = 0.062s
25984244777989501997155542913156869882081307629782603141084479047977114447740881200477139623732813907010625928930923928392846989307780602640371132948572817
Generating 1024-bit prime:
- Running 'generate_random_prime' on Aug 28 2013 - 20:32:32
- Finished 'generate_random_prime', execution time = 0.908s
243882388542316536634494263003432392747710195101405655638525548586460245284788605889744383018972173790715282433484881304387337823469782781741768460522379459266510277700837051129455157916910581124086417098371511903523124351405930407269823757127633787628818261404275424247249951449105500355258546878369871013997
Generating 2048-bit prime:
- Running 'generate_random_prime' on Aug 28 2013 - 20:32:33
- Finished 'generate_random_prime', execution time = 5.533s
39776036539901478897149714325867816317685970320648758660437997010070791355113972360487704959106715808827982189462166321507780789030474348830734451482712904952797349414199357350610796728394943895699307545779410349704150772133800816078777678410296457461265358640627126060911629419050162680462367824593593599185377995313654142814587795430128247012676213933608425620507345628348437045547639588191826106308640857920441463747225809423632900984055482095205841726987368122970422928048300140432181860090536264169603189938581906988427448587564353276787633192654461220520187517077846414138298337591988489586417244684890216725019

As you can see, primes up to 512512 bits are generated in negligible time. Even a 20482048 bit prime is generated in about five seconds which is quite impressive for a rather simple Python implementation.

Finally, let's put it all together to implement RSA, for which we will need an implementation of the extended greatest common divisor algorithm to determine the modular multiplicative inverse of a number.

def extended_gcd(a, b):
    """Returns pair (x, y) such that xa + yb = gcd(a, b)"""
    x, lastx, y, lasty = 0, 1, 1, 0
    while b != 0:
        q, r = divmod(a, b)
        a, b = b, r
        x, lastx = lastx - q * x, x
        y, lasty = lasty - q * y, y
    return lastx, lasty


def multiplicative_inverse(e, n):
    """Find the multiplicative inverse of e mod n."""
    x, y = extended_gcd(e, n)
    if x < 0:
        return n + x
    return x


def rsa_generate_key(bits):
    p = generate_random_prime(bits / 2)
    q = generate_random_prime(bits / 2)
    # Ensure q != p, though for large values of bits this is
    # statistically very unlikely
    while q == p:
        q = generate_random_prime(bits / 2)
    n = p * q
    phi = (p - 1) * (q - 1)
    # Here we pick a random e, but a fixed value for e can also be used.
    while True:
        e = random.randint(3, phi - 1)
        if fractions.gcd(e, phi) == 1:
            break
    d = multiplicative_inverse(e, phi)
    return (n, e, d)


def rsa_encrypt(message, n, e):
    return modular.power(message, e, n)


def rsa_decrypt(cipher, n, d):
    return modular.power(cipher, d, n)

And let's look at it in action:

>>> n, e, d = rsa_generate_key(8)
>>> print n, e, d
589 299 419
>>> message = 123
>>> cipher = rsa_encrypt(message, n, e)
>>> print cipher
309
>>> decrypted_message = rsa_decrypt(cipher, n, d)
>>> print decrypted_message
123

And same thing but with a much larger key, giving a more secure encryption scheme:

>>> n, e, d = rsa_generate_key(64)
>>> print n, e, d
49386969680342799959 39516096781065686051 14613413672463512651
>>> message = 123
>>> cipher = rsa_encrypt(message, n, e)
>>> print cipher
19163520560172629876
>>> decrypted_message = rsa_decrypt(cipher, n, d)
>>> print decrypted_message
123

And there it is. A very basic implementation of RSA that is still capable of handling rather large keys. Notice that the encryption and decryption algorithms are basically just modular exponentiation. Because of its importance in RSA's efficiency, modular exponentiation has been studied quite a bit in applied cryptography. A more efficient implementation would be using Montgomery reduction which I might go into in a future post.

Appendix: Basic Number Theory Disguised as Group Theory

A few results from group theory that we use above are reviewed and proved here. Most of these can also be expressed as results in number theory in less abstract forms, but let's be honest, they look way cooler and are much easier to remember when they are written in the language of group theory.

Remember that for element aa in group G,G, we define the order of aa as a|a| to be the smallest positive integer such that aa=eGa^{|a|}=e_G if such a number exists, and \infty otherwise.

Theorem
For any element aa in group G,G, ak=eGa^k=e_G iff kk divides a.|a|. Proof If k=tak=t|a| for some integer tt then ak=ata=(aa)t=eGt=eG.a^k=a^{t|a|}= (a^{|a|})^t=e_G^t=e_G. Conversely, suppose ak=eG.a^k=e_G. Let q,rq, r be such that k=qa+rk=q|a|+r and 0r<a.0\le{r}<|a|. Then eG=ak=aqa+r=aqaar=eGar=ar.e_G = a^{k}=a^{q|a|+r}=a^{q|a|}a^r=e_Ga^r =a^r. Hence ar=eGa^r=e_G which means r=0,r=0, giving k=qa.k=q|a|. \blacksquare
Theorem
Given group G,G, if aGa\in{G} has finite order then so does aka^k for any integer kk and furthermore
ak=agcd(a,k). |a^k| = \frac{|a|}{\gcd(|a|, k)}.

Proof Let n=a,n = |a|, t=ak,t = |a^k|, d=gcd(n,k)d=\gcd(n, k) and m=nd.m=\frac{n}{d}. We need to show t=m.t=m. First, since dd divides both nn and k,k, we have n=mdn=md and k=ldk=ld giving

(ak)m=amk=andld=anl=(an)l=eGl=eG. (a^k)^m = a^{mk} = a^{\frac{n}{d}ld}=a^{nl}=(a^n)^l = e_G^{l} = e_G.

This means mm divides t.t. On the other hand, since (ak)t=akt=eG(a^k)^t= a^{kt} = e_G by definition. This also means that ktkt divides n.n. So kt=nskt = ns for some integer s.s. This gives ldt=mdsldt = mds which simplifies to lt=mslt = ms by cancelling dd from both sides. Now, mm and ll are relatively prime since we already cancelled out their greatest common divisor d.d. Since mm is a divisor ltlt then it must divide t.t. So we have have mm dividing tt and tt dividing mm giving equality as needed. \blacksquare

We only state the following important result from group theory and use it, without proof. A proof can found in most basic group theory text books and is achieved by considering the cosets of the subgroup H.H.

Lagrange's Theorem
For any finite group GG and any subgroup HH of G,G, we have H|H| divides G.|G|.
Corollary
For any finite group GG and any aG,a\in{G}, we have aG=eG.a^{|G|}=e_G. Proof By Lagrange's theorem a=a|a| = |\langle{a}\rangle| divides G.|G|. The result then follows from the first theorem in this section. \blacksquare

Comments