Blog Archives

Linear Time Sorting – Bucket or Bin Sort

Assume that the keys of the items that we wish to sort lie in a small fixed range and that there is only one item with each value of the key. Then we can sort with the following procedure:

1. Set up an array of “bins” – one for each value of the key – in order,

2. Examine each item and use the value of the key to place it in the appropriate bin.

Now our collection is sorted and it only took n operations, so this is an O(n) operation. However, note that it will only work under very restricted conditions. To understand these restrictions, let’s be a little more precise about the specification of the problem and assume that there are m values of the key. To recover our sorted collection, we need to examine each bin. This adds a third step to the algorithm above,

3. Examine each bin to see whether there’s an item in it.

which requires m operations. So the algorithm’s time becomes:

 T (n) = c1n + c2m

and it is strictly O(n + m). If m ≤ n, this is clearly O(n).  However if m >> n, then it is O(m).  An implementation of bin sort might look like:

BUCEKTSORT( array  A, int  n, int  M)

1    // Pre-condition: for 1 ≤ i ≤ n, 0 ≤ a[i] < M

2    // Mark all the bins empty

3    for i ← 1 to  M

4    do bin[i] ← Empty

5    for i ← 1 to  n

6    do bin[A[i]] ← A[i]

If there are duplicates, then each bin can be replaced by a linked list. The third step then becomes:

3. Link all the lists into one list.

We can add an item to a linked list in O(1) time. There are n items requiring O(n) time. Linking a list to another list simply involves making the tail of one list point to the other, so it is O(1). Linking m such lists obviously takes O(m) time, so the algorithm is still O(n + m).