Prime Factors and Run-Length Compression
Dan M. Piehl
November 2, 2003

Introduction

The following mathematical construction is designed to create specialized correspondences between integers and sequences of integers.

Exploring the Sequence of Primes

In this section, I will refer to prime numbers. The primes are positive integers which are divisible only by one and themselves. I will not consider 1 to be prime. Integers which can be factored into a product of primes are called composite integers. For example, 12 = 3 * 4 indicates that 12 is a composite integer. Additionally, when an integer is expressed as a product of primes only (ex. 12 = 2 * 3 * 3 ), it is referred to as the prime factorization. Although the primes in the factorization can be arranged in any order, the collection primes and their quantity in the factorization is unique.

It can also be shown that there are infinitely many primes. In the following discussion, I will define the sequence P(n) to be the nth prime in order of occurrence in the positive integers. Thus, the function maps all positive integers to the primes :

P(1) = 2 , P(2) = 3, P(3) = 5, P(4) = 7, P(5) = 11, etc...

Because all integers have a unique prime factorizations (ignoring order of the primes), we can always express any positive integer as a unique sequence, S(n), of powers of each of the primes in order. I will refer to this sequence as the prime-exponent sequence of a number. For example, the prime-exponent sequence of the number 12 is {2,1,0,0,0,...}, because it can be represented as P(1)^2 * P(2)^1 * P(3)^0 * P(4)^0 * ...

In general, S(n) is the prime-exponent sequences of a positive integer K, if and only if :

K = P(1)^S(1) * P(2)^S(2) * P(3)^S(3) * P(4)^S(4) * P(5)^S(5) * . . .

Because P(n) is an unbounded increasing sequence, the sequence S(n) will remain zero for all sufficiently large values of n. I will refer to sequences which have this property as converging to zero.

Therefore, using the sequence of primes, we have been able to develop a one-to-one mapping between the positive integers and the set of all sequences of non-negative integers which converge to zero.

Binary Run-Length Sequences

The decimal notation, or base 10, that is used for representing numbers provides a convenient way to express very large numbers with comparably small sequences of digits. In computer science, a similar notation known as binary has been developed to represent signals used in digital computers. Binary numbers are just a different way to express a number using a more limited collection of digits, in this case, 0 and 1. Decimal notation will be affixed with (dec) and binary will be affixed with (bin). For example, 13 (dec) = 1101 (bin). It is clear that leading zero digits (as in 013) are insignificant in both decimal and binary.

For this discussion, I will be referring to the run-length or a sequence of digits. This is the number of consecutive instances of a digit from a given position in a given number. By consecutive instances, I mean digits in ascending significance (from right to left.) For example, in the decimal number 948777927, the sixth digit, which is a 7, has a run-length of 3.

Non-negative integers have a unique decimal representation, given sufficient limitation on symbolic representation (the absence of leading zeros, etc.). This property is true of all base numbering systems with integer bases greater than 1. Therefore, we can conclude that all non-negative integers have a unique binary representation or sequences of 0's and 1's. Also, we can say that this sequence, which runs in digits of increasing significance, must converge to zero.

Ex: 13 (dec) = 1101 (bin) has the binary sequence B(n) = {1,0,1,1,0,0,0,0,0,...}

I now define the run-lengh sequence R(n) of a given non-negative integer to be the sequence of run-lengths of 1's in the binary sequence of the given number with the run-lengths starting at the first element and at each digit following a zero digit. In other words, it is the sequence of the number of 1's between consecutive 0's in the binary sequence for a given number.

Ex: B(n) = {1,0,1,1,0,0,0,0,0,...} has the run-length sequence R(n) = {1,2,0,0,0,0,0,...}

Similar to the previous section, we have found a one-to-one mapping from the non-negative integers to the set of all sequences of non-negative integers which converge to zero.

Conclusion

It is worth noting that these mappings are established using very disparate mathematical technologies. Although in each case, a byte-sequence or other computer-related entity can be converted from integers to sequences and vice-versa. A conversion from integer-to-sequence using one technique, followed by another conversion from sequence-to-integer using the other technique, yields a one-to-one composite mapping between the set of positive integers and the set of non-negative integers.

It is not clear at this time what underlying structures exist using a more rigorous exploration using number theory. However, the mathematical machinery is relatively easy to program on a computer and encompasses a good deal of application potential for data compression or cryptographic applications.


Return to Dan's Home Page
Last Update: November 2, 2003