Blog Archives

Linear Time Sorting, Counting Sort & Radix Sort

The lower bound of sorting algorithms implies that if we hope to sort numbers faster than O(n log n), we cannot do it by making comparisons alone. Is it possible to sort without making comparisons?

The answer is yes, but only under very restrictive circumstances. Many applications involve sorting small integers (e.g. sorting characters, exam scores, etc.). We present three algorithms based on the theme of speeding up sorting in special cases, by not making comparisons.

Counting sort assumes that the numbers to be sorted are in the range 1 to k where k is small. The basic idea is to determine the rank of each number in final sorted array.

The rank of an item is the number of elements that are less than or equal to it. Once we know the ranks, we simply copy numbers to their final position in an output array.

The question is how to find the rank of an element without comparing it to the other elements of the array?. The algorithm uses three arrays.

A[1..n] holds the initial input, B[1..n] holds the sorted output and C[1..k] is an array of integers.

C[x] is the rank of x in A, where x ∈ [1..k].

The algorithm is remarkably simple, but deceptively clever. The algorithm operates by first constructing C. This is done in two steps.

First we set C[x] to be the number of elements of A[j] that are equal to x. We can do this initializing C to zero, and then for each j, from 1 to n, we increment C[A[j]] by 1.

Thus, if A[j] = 5, then the 5th element of C is incremented, indicating that we have seen one more 5. To determine the number of elements that are less than or equal to x, we replace C[x] with the sum of elements in the sub array R[1 : x]. This is done by just keeping a running total of the elements of C.

C[x] now contains the rank of x. This means that if x = A[j] then the final position of A[j] should be at position C[x] in the final sorted array.

Thus, we set B[C[x]] = A[j]. Notice We need to be careful if there are duplicates, since we do not want them to overwrite the same location of B. To do this, we decrement C[i] after copying.

COUNTING-SORT( array  A, array  B, int  k)

1    for i ← 1 to  k

2    do C[i] ← 0           k times

3    for j ← 1 to  length[A]

4    do C[A[j]] ← C[A[j]] + 1             n times

5    // C[i] now contains the number of elements = i

6    for i ← 2 to  k

7    do C[i] ← C[i] + C[i − 1]            k times

8    // C[i] now contains the number of elements ≤ i

9    for j ← length[A] downto 1

10    do B[C[A[j]]] ← A[j]

11         C[A[j]] ← C[A[j]] − 1             n times

There are four (un-nested) loops, executed k times, n times, k − 1 times, and n times, respectively, so the total running time is Θ(n + k) time. If k = O(n), then the total running time is Θ(n).

Counting sort is not an in-place sorting algorithm but it is stable. Stability is important because data are often carried with the keys being sorted. radix sort (which uses counting sort as a subroutine) relies on it to work correctly. Stability achieved by running the loop down from n to 1 and not the other way around.

Radix Sort

The main shortcoming of counting sort is that it is useful for small integers, i.e., 1..k where k is small. If k were a million or more, the size of the rank array would also be a million. Radix sort provides a nice work around this limitation by sorting numbers one digit at a time.

576            49[4]            9[5]4            [1]76            176

494            19[4]            5[7]6            [1]94            194

194            95[4]            1[7]6            [2]78            278

296    ⇒     57[6]    ⇒     2[7]8    ⇒     [2]96    ⇒     296

278            29[6]            4[9]4            [4]94            494

176            17[6]            1[9]4            [5]76            576

954            27[8]            2[9]6            [9]54            954

Here is the algorithm that sorts A[1..n] where each number is d digits long.

RADIX-SORT( array  A, int  n, int  d)

1    for i ← 1 to  d

2    do stably sort A w.r.t i th lowest order digit