# Binary Search Tree

The mathematical definition of a binary tree is

“A binary tree is a finite set of elements that is either empty or is partitioned into three disjoint subsets. The first subset contains a single element called the root of the tree. The other two subsets are themselves binary trees called the left and right sub-trees”. Each element of a binary tree is called a node of the tree.

A simple binary tree of size 9 and height 3, with a root node whose value is 2. The above tree is unbalanced and not sorted. (Photo credit: Wikipedia)

There are nine nodes in the above figure of the binary tree.  The node A is at the top of the tree. There are two lines from the node A to left and right sides towards node B and C respectively.
The definition of tree is of recursive nature. This is due to the fact that We have seen that which definition has been applied to the tree having node A as root, is applied to its subtrees one level downward. Similarly as long as we go downward in the tree the same definition is used at each level of the tree. And we see three parts i.e. root, left subtree and right subtree at each level.

Now if we carry out the same process on the right subtree of root node A, the node C will become the root of this right subtree of A. The left subtree of this root node C is empty. The right subtree of C is made up of three nodes F, H and I.

I have discussed that the node on the top is said root of the tree. If we consider the relationship of these nodes, A is the parent node with B and C as the left and right descendants respectively. C is the right descendant of A. Afterwards, We look at the node B where the node D is its left descendant and E is its right descendant. I can use the words descendant and child interchangeably. Now look at the nodes D, G, H and I. These nodes are said leaf nodes as there is no descendant of these nodes. I am just introducing the terminology here. In the algorithms, I can use the words root or parent, child or descendant. So the names never matter.

Strictly Binary  Tree

There is a version of the binary tree, called strictly binary tree. A binary tree is said to be a strictly binary tree if every non-leaf node in a binary tree has non-empty left and right subtrees.

Complete Binary Tree

The definition of the complete binary tree is“A complete binary tree of depth d is the strictly binary tree all of whose leaves are at level d”.The number of nodes at a level is equal to the level number raising to the power of two. Thus we can say that in a complete binary tree at a particular level k, the number of nodes complete is equal  to 2k. Note  that  this  formula  for the number  of nodes  is for a binary  tree  only.  It  is  not  necessary  that  every  binary  tree  fulfill  this criterion. Applying this formula to each level while going to the depth of the tree (i.e. d), we can calculate the total number of nodes in a complete binary tree of depth d by adding  the  number  of nodes  at  each  level. This  can  be  written  as  the  following

20+ 21+ 22 + ……… + 2d = 6d j=0 2j = 2d+1 – 1

Thus according to this summation, the total number of nodes in a complete binary tree of depth d will be 2d+1  – 1. Thus if there is a complete binary tree of depth 4, the total number of nodes in it will be calculated by putting the value of d equal to 4. It will be calculated as under.

24+1 – 1 = 25 – 1 = 32 – 1 = 31

Thus the total number of nodes in the complete binary tree of depth 4 is 31.

We know that the total number of nodes (leaf and non-leaf) of a complete binary tree of depth d is equal to 2d+1 – 1. In a complete binary tree, all the leaf nodes are at the depth level d. So the number of nodes at level d will be 2d . These are the leaf nodes. Thus the difference of total number of nodes and number of leaf nodes will give us the number of non-leaf (inner) nodes. It will be (2d+1 – 1) – 2d  i.e. 2d  – 1. Thus we conclude that in a complete binary tree, there are 2d  leaf nodes and  2d  – 1 non-leaf (inner) nodes.

Level of a Complete Binary Tree

We can conclude some more results from the above mathematical expression. We can find the depth of a complete binary tree if we know the total number of nodes. If we have a complete  binary  tree with n total nodes,  then by the equation  of the total number of nodes I can write

Total number of nodes = 2d+1 – 1 = n

To find the value of d, I solve the above equation as under

2d+1 – 1 = n

2d+1 =  n + 1

d + 1 = log2 (n + 1)

d = log2 (n + 1) – 1

After having n total nodes, we can find the depth d of the tree by the above equation. Suppose we have 1000,000 nodes. It means that the value of n is 1000,000, reflecting a depth i.e. d of the tree will be log2  (1000000 + 1) – 1, which evaluates to 20. So the depth of the tree will be 20. In other words, the tree will be 20 levels deep.

Cost of Search

In the discussion on binary tree, I talked about the level and number of nodes of a binary tree. It was witnessed that if we have a complete binary tree with n numbers of nodes, the depth d of the tree can be found by the following equation:

d = log2 (n + 1) – 1

Suppose I have a complete binary tree in which there are 1000,000 nodes, then its depth d will be calculated in the following fashion.

d = log2 (1000000 + 1) – 1 = log2 (1000001) – 1= 20

The statement shows that if there are 1000,000 unique numbers (nodes) stored in a complete binary tree, the tree will have 20 levels. Now if I want to find a number x in this tree (in other words, the number is not in the tree), I have to make maximum 20 comparisons i.e. one comparison at each level. Now I can understand the benefit of  tree as compared to the linked list. If I have a linked list of 1000,000 numbers, then there may be 1000,000 comparisons (supposing the number is not there) to find a number x in the list.

Thus in a tree, the search is very fast as compared to the linked list. If  the tree is complete binary or near-to-complete, searching through 1000,000 numbers will require a maximum of 20 comparisons  or in general, approximately  log2(n). Whereas in a linked list, the comparisons required could be a maximum of n.

Tree with the linked structure, is not a difficult data structure. I have used it to allocate memory, link and set pointers to it. It is not much difficult process. In a tree, I link the nodes in such a way that it does not remain a linear structure. If instead of 1000,000, I have 1 million or 10 million or say, 1 billion numbers and want to build a complete binary tree of these numbers, the depth (number of levels) of the tree will be log2 of these numbers.The log2 of these numbers will be a small number, suppose 25, 30 or 40.

Thus I see that the number of level does not increase in such a ratio as the number of nodes increase. So I can search a number x in a complete binary tree of 1 billion nodes only in 30-40 comparisons. As the linked list of such a large number grows large, the search of a number  in such a case will also get time consuming process. The usage of memory space does not cause any effect in the linked list and tree data structures.  I use the memory dynamically  in both structures.

However, time is a major factor. Suppose one comparison  takes one micro second, then one billion seconds are required to find a number from a linked list (I are supposing the worst case of search where traversing of the whole list may be needed). This time will be in hours. On the other hand, in case of building a complete binary tree of these one billion numbers, I have to make 30-40 comparisons (as the levels of the tree will be 30-40), taking only 30-40 microseconds. I can clearly see the difference between hours and microseconds. Thus it is better to prefer the process of building a tree of the data to storing it in a linked list to make the search process faster.

Binary Search Tree

While  discussing  the search procedure,  the tree for search  was built in a specific order. The order was such that on the addition of a number in the tree, we compare it with a node. If it is less than this, it can be added to the left sub-tree of the node. Otherwise, it will be added on the right sub-tree.

This way, the tree built by us has numbers less than the root in the left sub-tree and the numbers greater than the root in the right sub-tree. A binary tree with such a property that items in the left sub-tree are smaller than the root and items in the right sub-tree are larger than the root is called a binary search  tree (BST).

The searching and sorting operations are very common in computer science. In most of the cases, I sort the data before a search operation. The building process of a binary search tree is actually a process of storing the data in a sorted form. The BST has many variations, which will be discussed later. The BST and its variations play an important role in searching algorithms. As data in a BST is in an order, it may also be termed as ordered tree.

Traversing a Binary Tree

Now let’s discuss the ways to print the numbers present in a BST. In a linked list, the printing of stored values is easy. It is due to the fact that I know where from, a programmer  needs  to  start  and  where  the  next  element  is.  Equally  is  true  about printing of the elements in an array. I execute a for loop starting from the first element (i.e. index 0) to the last element of the array. Now let’s see how we can traverse a tree to print (display) the numbers (or any data items) of the tree.

Here we see that the root node has the number 14. The left sub-tree has only one node i.e. number 4. Similarly the right sub-tree consists of a single node with the number 15.  If we apply the permutations  combinations  on these three nodes to print them, there may be the following six possibilities.

1: (4, 14, 15)

2: (14, 4, 15)

3: (15, 4, 14)

4: (4, 15, 14)

5: (14, 15, 4)

6: (15, 14, 4)

Look  at  these  six  combinations  of  printing  the  nodes  of  the  tree.  In  the  first combination,  the order of printing the nodes is 4-14-15. It means that left subtree- root-right  subtree.  In  the  second  combination  the  order  is  root-left  subtree-right subtree. In the third combination, the order of printing the nodes is right subtree-root- left subtree. The fourth combination has left subtree-right subtree-root. The fifth combination   has   the   order   root-rigth   subtree-   left   subtree.

Finally   the   sixth combination has the order of printing the nodes right subtree-root-left subtree. These six possibilities are for the three nodes only. If we have a tree having a large number of nodes, then there may increase number of permutations for printing the nodes.

Let’s see the general procedure of traversing a binary tree. We know by definition that a  binary  tree  consists  of  three  sets  i.e.  root,  left  subtree  and  right  subtree.

1: (L, N, R)

2: (N, L, R)

3: (R, L, N)

4: (L, R, N)

5: (N, R, L)

6: (R, N, L)

In these  permutations,  the left and right  subtree  are not single  nodes.  These  may consist of several nodes. Thus where I see L in the permutations, it means traversing the left subtree. Similarly R means traversing the right subtree. In the previous tree of three  nodes,  these  left and right  subtrees  are of single  nodes.  However,  they  can consist of any number of nodes. We select the following three permutations from the above six. The first of these three is (N, L, R), also called as preorder traversal. The second permutation is (L, N, R) which I called inorder traversal. Finally the third permutation,  also termed as postorder  traversal is (L, R, N).