Blog Archives

Linear Time Sorting – Bucket or Bin Sort

Assume that the keys of the items that we wish to sort lie in a small fixed range and that there is only one item with each value of the key. Then we can sort with the following procedure:

1. Set up an array of “bins” – one for each value of the key – in order,

2. Examine each item and use the value of the key to place it in the appropriate bin.

Now our collection is sorted and it only took n operations, so this is an O(n) operation. However, note that it will only work under very restricted conditions. To understand these restrictions, let’s be a little more precise about the specification of the problem and assume that there are m values of the key. To recover our sorted collection, we need to examine each bin. This adds a third step to the algorithm above,

3. Examine each bin to see whether there’s an item in it.

which requires m operations. So the algorithm’s time becomes:

 T (n) = c1n + c2m

and it is strictly O(n + m). If m ≤ n, this is clearly O(n).  However if m >> n, then it is O(m).  An implementation of bin sort might look like:

BUCEKTSORT( array  A, int  n, int  M)

1    // Pre-condition: for 1 ≤ i ≤ n, 0 ≤ a[i] < M

2    // Mark all the bins empty

3    for i ← 1 to  M

4    do bin[i] ← Empty

5    for i ← 1 to  n

6    do bin[A[i]] ← A[i]

If there are duplicates, then each bin can be replaced by a linked list. The third step then becomes:

3. Link all the lists into one list.

We can add an item to a linked list in O(1) time. There are n items requiring O(n) time. Linking a list to another list simply involves making the tail of one list point to the other, so it is O(1). Linking m such lists obviously takes O(m) time, so the algorithm is still O(n + m).

Linear Time Sorting, Counting Sort & Radix Sort

The lower bound of sorting algorithms implies that if we hope to sort numbers faster than O(n log n), we cannot do it by making comparisons alone. Is it possible to sort without making comparisons?

The answer is yes, but only under very restrictive circumstances. Many applications involve sorting small integers (e.g. sorting characters, exam scores, etc.). We present three algorithms based on the theme of speeding up sorting in special cases, by not making comparisons.

Counting sort assumes that the numbers to be sorted are in the range 1 to k where k is small. The basic idea is to determine the rank of each number in final sorted array.

The rank of an item is the number of elements that are less than or equal to it. Once we know the ranks, we simply copy numbers to their final position in an output array.

The question is how to find the rank of an element without comparing it to the other elements of the array?. The algorithm uses three arrays.

A[1..n] holds the initial input, B[1..n] holds the sorted output and C[1..k] is an array of integers.

C[x] is the rank of x in A, where x ∈ [1..k].

The algorithm is remarkably simple, but deceptively clever. The algorithm operates by first constructing C. This is done in two steps.

First we set C[x] to be the number of elements of A[j] that are equal to x. We can do this initializing C to zero, and then for each j, from 1 to n, we increment C[A[j]] by 1.

Thus, if A[j] = 5, then the 5th element of C is incremented, indicating that we have seen one more 5. To determine the number of elements that are less than or equal to x, we replace C[x] with the sum of elements in the sub array R[1 : x]. This is done by just keeping a running total of the elements of C.

C[x] now contains the rank of x. This means that if x = A[j] then the final position of A[j] should be at position C[x] in the final sorted array.

Thus, we set B[C[x]] = A[j]. Notice We need to be careful if there are duplicates, since we do not want them to overwrite the same location of B. To do this, we decrement C[i] after copying.

COUNTING-SORT( array  A, array  B, int  k)

1    for i ← 1 to  k

2    do C[i] ← 0           k times

3    for j ← 1 to  length[A]

4    do C[A[j]] ← C[A[j]] + 1             n times

5    // C[i] now contains the number of elements = i

6    for i ← 2 to  k

7    do C[i] ← C[i] + C[i − 1]            k times

8    // C[i] now contains the number of elements ≤ i

9    for j ← length[A] downto 1

10    do B[C[A[j]]] ← A[j]

11         C[A[j]] ← C[A[j]] − 1             n times

There are four (un-nested) loops, executed k times, n times, k − 1 times, and n times, respectively, so the total running time is Θ(n + k) time. If k = O(n), then the total running time is Θ(n).

Counting sort is not an in-place sorting algorithm but it is stable. Stability is important because data are often carried with the keys being sorted. radix sort (which uses counting sort as a subroutine) relies on it to work correctly. Stability achieved by running the loop down from n to 1 and not the other way around.

Radix Sort

The main shortcoming of counting sort is that it is useful for small integers, i.e., 1..k where k is small. If k were a million or more, the size of the rank array would also be a million. Radix sort provides a nice work around this limitation by sorting numbers one digit at a time.

576            49[4]            9[5]4            [1]76            176

494            19[4]            5[7]6            [1]94            194

194            95[4]            1[7]6            [2]78            278

296    ⇒     57[6]    ⇒     2[7]8    ⇒     [2]96    ⇒     296

278            29[6]            4[9]4            [4]94            494

176            17[6]            1[9]4            [5]76            576

954            27[8]            2[9]6            [9]54            954

Here is the algorithm that sorts A[1..n] where each number is d digits long.

RADIX-SORT( array  A, int  n, int  d)

1    for i ← 1 to  d

2    do stably sort A w.r.t i th lowest order digit