Binary Search Tree
The mathematical definition of a binary tree is
“A binary tree is a finite set of elements that is either empty or is partitioned into three disjoint subsets. The first subset contains a single element called the root of the tree. The other two subsets are themselves binary trees called the left and right sub-trees”. Each element of a binary tree is called a node of the tree.
There are nine nodes in the above figure of the binary tree. The node A is at the top of the tree. There are two lines from the node A to left and right sides towards node B and C respectively.
The definition of tree is of recursive nature. This is due to the fact that We have seen that which definition has been applied to the tree having node A as root, is applied to its subtrees one level downward. Similarly as long as we go downward in the tree the same definition is used at each level of the tree. And we see three parts i.e. root, left subtree and right subtree at each level.
Now if we carry out the same process on the right subtree of root node A, the node C will become the root of this right subtree of A. The left subtree of this root node C is empty. The right subtree of C is made up of three nodes F, H and I.
I have discussed that the node on the top is said root of the tree. If we consider the relationship of these nodes, A is the parent node with B and C as the left and right descendants respectively. C is the right descendant of A. Afterwards, We look at the node B where the node D is its left descendant and E is its right descendant. I can use the words descendant and child interchangeably. Now look at the nodes D, G, H and I. These nodes are said leaf nodes as there is no descendant of these nodes. I am just introducing the terminology here. In the algorithms, I can use the words root or parent, child or descendant. So the names never matter.
Strictly Binary Tree
There is a version of the binary tree, called strictly binary tree. A binary tree is said to be a strictly binary tree if every non-leaf node in a binary tree has non-empty left and right subtrees.
20+ 21+ 22 + ……… + 2d = 6d j=0 2j = 2d+1 – 1
Thus according to this summation, the total number of nodes in a complete binary tree of depth d will be 2d+1 – 1. Thus if there is a complete binary tree of depth 4, the total number of nodes in it will be calculated by putting the value of d equal to 4. It will be calculated as under.
24+1 – 1 = 25 – 1 = 32 – 1 = 31
Thus the total number of nodes in the complete binary tree of depth 4 is 31.
We know that the total number of nodes (leaf and non-leaf) of a complete binary tree of depth d is equal to 2d+1 – 1. In a complete binary tree, all the leaf nodes are at the depth level d. So the number of nodes at level d will be 2d . These are the leaf nodes. Thus the difference of total number of nodes and number of leaf nodes will give us the number of non-leaf (inner) nodes. It will be (2d+1 – 1) – 2d i.e. 2d – 1. Thus we conclude that in a complete binary tree, there are 2d leaf nodes and 2d – 1 non-leaf (inner) nodes.
Level of a Complete Binary Tree
We can conclude some more results from the above mathematical expression. We can find the depth of a complete binary tree if we know the total number of nodes. If we have a complete binary tree with n total nodes, then by the equation of the total number of nodes I can write
Total number of nodes = 2d+1 – 1 = n
To find the value of d, I solve the above equation as under
2d+1 – 1 = n
2d+1 = n + 1
d + 1 = log2 (n + 1)
d = log2 (n + 1) – 1
After having n total nodes, we can find the depth d of the tree by the above equation. Suppose we have 1000,000 nodes. It means that the value of n is 1000,000, reflecting a depth i.e. d of the tree will be log2 (1000000 + 1) – 1, which evaluates to 20. So the depth of the tree will be 20. In other words, the tree will be 20 levels deep.
Cost of Search
In the discussion on binary tree, I talked about the level and number of nodes of a binary tree. It was witnessed that if we have a complete binary tree with n numbers of nodes, the depth d of the tree can be found by the following equation:
d = log2 (n + 1) – 1
Suppose I have a complete binary tree in which there are 1000,000 nodes, then its depth d will be calculated in the following fashion.
d = log2 (1000000 + 1) – 1 = log2 (1000001) – 1= 20
The statement shows that if there are 1000,000 unique numbers (nodes) stored in a complete binary tree, the tree will have 20 levels. Now if I want to find a number x in this tree (in other words, the number is not in the tree), I have to make maximum 20 comparisons i.e. one comparison at each level. Now I can understand the benefit of tree as compared to the linked list. If I have a linked list of 1000,000 numbers, then there may be 1000,000 comparisons (supposing the number is not there) to find a number x in the list.
Thus in a tree, the search is very fast as compared to the linked list. If the tree is complete binary or near-to-complete, searching through 1000,000 numbers will require a maximum of 20 comparisons or in general, approximately log2(n). Whereas in a linked list, the comparisons required could be a maximum of n.
Tree with the linked structure, is not a difficult data structure. I have used it to allocate memory, link and set pointers to it. It is not much difficult process. In a tree, I link the nodes in such a way that it does not remain a linear structure. If instead of 1000,000, I have 1 million or 10 million or say, 1 billion numbers and want to build a complete binary tree of these numbers, the depth (number of levels) of the tree will be log2 of these numbers.The log2 of these numbers will be a small number, suppose 25, 30 or 40.
Thus I see that the number of level does not increase in such a ratio as the number of nodes increase. So I can search a number x in a complete binary tree of 1 billion nodes only in 30-40 comparisons. As the linked list of such a large number grows large, the search of a number in such a case will also get time consuming process. The usage of memory space does not cause any effect in the linked list and tree data structures. I use the memory dynamically in both structures.
However, time is a major factor. Suppose one comparison takes one micro second, then one billion seconds are required to find a number from a linked list (I are supposing the worst case of search where traversing of the whole list may be needed). This time will be in hours. On the other hand, in case of building a complete binary tree of these one billion numbers, I have to make 30-40 comparisons (as the levels of the tree will be 30-40), taking only 30-40 microseconds. I can clearly see the difference between hours and microseconds. Thus it is better to prefer the process of building a tree of the data to storing it in a linked list to make the search process faster.
Binary Search Tree
While discussing the search procedure, the tree for search was built in a specific order. The order was such that on the addition of a number in the tree, we compare it with a node. If it is less than this, it can be added to the left sub-tree of the node. Otherwise, it will be added on the right sub-tree.
This way, the tree built by us has numbers less than the root in the left sub-tree and the numbers greater than the root in the right sub-tree. A binary tree with such a property that items in the left sub-tree are smaller than the root and items in the right sub-tree are larger than the root is called a binary search tree (BST).
The searching and sorting operations are very common in computer science. In most of the cases, I sort the data before a search operation. The building process of a binary search tree is actually a process of storing the data in a sorted form. The BST has many variations, which will be discussed later. The BST and its variations play an important role in searching algorithms. As data in a BST is in an order, it may also be termed as ordered tree.
Traversing a Binary Tree
Now let’s discuss the ways to print the numbers present in a BST. In a linked list, the printing of stored values is easy. It is due to the fact that I know where from, a programmer needs to start and where the next element is. Equally is true about printing of the elements in an array. I execute a for loop starting from the first element (i.e. index 0) to the last element of the array. Now let’s see how we can traverse a tree to print (display) the numbers (or any data items) of the tree.
Here we see that the root node has the number 14. The left sub-tree has only one node i.e. number 4. Similarly the right sub-tree consists of a single node with the number 15. If we apply the permutations combinations on these three nodes to print them, there may be the following six possibilities.
1: (4, 14, 15)
2: (14, 4, 15)
3: (15, 4, 14)
4: (4, 15, 14)
5: (14, 15, 4)
6: (15, 14, 4)
Look at these six combinations of printing the nodes of the tree. In the first combination, the order of printing the nodes is 4-14-15. It means that left subtree- root-right subtree. In the second combination the order is root-left subtree-right subtree. In the third combination, the order of printing the nodes is right subtree-root- left subtree. The fourth combination has left subtree-right subtree-root. The fifth combination has the order root-rigth subtree- left subtree.
Finally the sixth combination has the order of printing the nodes right subtree-root-left subtree. These six possibilities are for the three nodes only. If we have a tree having a large number of nodes, then there may increase number of permutations for printing the nodes.
Let’s see the general procedure of traversing a binary tree. We know by definition that a binary tree consists of three sets i.e. root, left subtree and right subtree.
1: (L, N, R)
2: (N, L, R)
3: (R, L, N)
4: (L, R, N)
5: (N, R, L)
6: (R, N, L)
In these permutations, the left and right subtree are not single nodes. These may consist of several nodes. Thus where I see L in the permutations, it means traversing the left subtree. Similarly R means traversing the right subtree. In the previous tree of three nodes, these left and right subtrees are of single nodes. However, they can consist of any number of nodes. We select the following three permutations from the above six. The first of these three is (N, L, R), also called as preorder traversal. The second permutation is (L, N, R) which I called inorder traversal. Finally the third permutation, also termed as postorder traversal is (L, R, N).
Please follow below link to see complete code of a binary search tree template class.
- Binary Search Tree template Class (BST) (alikhuram.wordpress.com)
Posted on April 30, 2013, in A library re-written., Algorithms, C++, Data Structures and tagged Binary tree, BinarySearchTree, Data structure, Tree, Tree traversal. Bookmark the permalink. Leave a comment.