Category Archives: System Programming.

This presentation contains slides from Linux device drivers workshop organized by Emertxe Information Technologies

(http://www.emertxe.com)

File permissions and umask()

Designing a Window Manager for a Micro-Controller

Algorithms matter!

 Algorithms matter! Knowing which algorithm to apply under which set of  circumstances can make a big difference in the software you produce. If you don’t  believe us, just read the following story about how Gary turned failure into success with a little analysis and choosing the right algorithm for the job.’

Once  upon  a time, Gary worked at a company  with a lot of   brilliant software developers. Like most organizations  with a lot of  bright people, there were many great ideas and people to implement them in the software products. One such person was Graham, who had been with the company from its inception. Graham came up with an idea on how to find out whether a program had any memory leaks-a common problem with C and C++ programs at the time. If a program ran long enough and had memory leaks, it would crash because it would run out of   memory. Anyone who has programmed in a language that doesn’t support automatic memory management and garbage collection knows this problem well.

Graham  decided  to  build  a small  library  that  wrapped the operating system’s memory allocation and de-allocation routines, malloc()  and free(), with his own functions. Graham’s functions recorded each memory allocation and de-allocation in a data structure that could be queried when the program finished. The wrapper functions recorded the information and called the real operating system functions to perform the actual memory management. It took just a few hours for Graham to implement the solution and, voila, it worked! There was just one problem: the program ran so slowly when it was instrumented with Graham’s libraries that no one was willing to use it. We’re  talking really slow here. You could start  up a program, go have a cup of  coffee-or maybe a pot of  coffee-come  back, and the program would still be crawling along. This was clearly unacceptable.

Now Graham was really smart when it came to understanding operating systems and how their internals work. He was an excellent programmer who could write more working code in an hour than most programmers could write in a day. He had studied algorithms, data structures, and all of  the standard  topics in college, so why did the code execute so much slower with the wrappers inserted? In this case, it was a problem  of  knowing enough to make the program work, but not thinking through the details to make it work quickly. Like many creative people, Graham was already thinking about his next program and didn’t want to go back to his memory leak program to find out what was wrong. So, he asked Gary to take a look at it and see whether he could fix it. Gary was more of  a compiler and software engineering type of  guy and seemed to be pretty good at honing code to make it release-worthy.

Gary thought he’d talk to Graham about the program before he started digging into the code. That way, he might better understand how Graham structured his solution and why he chose particular implementation options.

Understand the Problem

A  good way to solve problems is to start with the big picture: understand the problem, identify potential causes, and then dig into the details. If you decide to try to solve the problem because you think you know the cause, you may solve the wrong problem, or you might not explore other-possibly better-answers. The first thing Gary did was ask Graham to describe the problem and his solution.

Graham said that he wanted to determine whether a program had any memory leaks. He  thought  the  best way to  find out  would  be to keep a record  of   all memory that  was allocated by the program, whether it was freed before the program ended, and a record of  where the allocation was requested in   the user’s program. His solution required him to build a small library with three functions:

malloc()

A  wrapper around the operating system’s memory allocation function

 free()

A  wrapper around the operating system’s memory de-allocation function

exit()

A  wrapper  around  the operating system’s function  called when a program exits

This custom library would be linked with the program under test in  such a way that the customized functions would be called instead of   the operating system’s functions. The custom malloc()  and free() functions would keep track of  each allocation and de-allocation. When the program under test finished, there would be no memory leak if every allocation was subsequently de-allocated. If there were any  leaks,  the  information  kept  by  Graham’s   routines  would  allow  the programmer to find the code that caused them. When the exit() function was called, the custom library routine would display its results before actually exiting. Graham sketched out what his solution looked like.

The  description  seemed clear enough.  Unless Graham  was  doing something terribly wrong in  his code to wrap the operating system functions, it was hard to imagine that there was a performance problem in the wrapper code. If there were, then all programs would be proportionately slow. Gary asked whether there was a difference  in    the  performance  of    the  programs  Graham  had  tested.  Graham explained that the running profile seemed to be that small programs-those that did  relatively little-all ran  in acceptable  time,  regardless of   whether they had memory leaks. However, programs that did a lot of  processing and had memory leaks ran disproportionately slow.

Experiment if Necessary

Before going any further, Gary wanted to get a better understanding of  the running profile of  programs. He and Graham sat down and wrote some short programs to see how they ran with Graham’s custom library linked in. Perhaps they could get a better understanding of  the conditions that caused the problem to arise. The   first  test  program   Gary  and   Graham   wrote   (Program A),

Program A codeint  main(int argc,  char **argv) {

int i =   o;

for (i =   o; i < 1000000;  i++)

malloc(32);

exit  (o);

}

They ran the program and waited for the results. It  took several minutes to finish. Although computers were slower  back then, this was clearly unacceptable. When this program  finished, there were 32  ME of   memory leaks.  How would  the program  run  if  all of   the memory allocations were  de-allocated?  They made a simple modification to create Program B,

Program B code

int main(int argc, char **argv){

int i=  0;

for (i = o; i < 1000000; i++)

void *x =  malloc(32);

free(x);

exit (o);

}

When they compiled and ran Program B, it completed in  a few seconds. Graham was convinced that the problem was related to the number of  memory allocations open  when  the  program ended, but  couldn’t  figure out  where  the problem occurred. He had searched  through his code for several hours and was unable t o find any problems. Gary wasn’t  as convinced as Graham that the problem was the number of  memory leaks. He suggested one more experiment and made an other modification to the program,  as Program C  in which the de-allocations were grouped together at the end of  the program.

Program C code

int main(int argc, char **argv){

int i=  0;

void *addrs[1000000];

for (i = o; i < 1000000; i++){

addrs[i] =  malloc(32;)

for (i = o; i < 1000000; i++){

free(addrs[i]);

exit (o);

}

This program crawled along even slower  than  the first program! This example invalidated the theory that the number of  memory leaks affected the performance of  Graham’s program. However, the example gave Gary an insight that led to the real problem.

It wasn’t the number of  memory allocations open at the end of  the program that affected performance; it was the maximum number of  them that were open at any single time. If  memory leaks were not the only factor  affecting performance, then there had to be something about the way Graham maintained the information used to determine whether there were leaks. In  Program B, there was never more than one 32-byte chunk of  memory allocated at any point during the program’s execution.  The  first and  third  programs  had  one  million  open  allocations.

Allocating and de-allocating memory was not the issue, so the problem must be in the bookkeeping code Graham wrote to keep track of  the memory.

Gary asked Graham how he kept track of  the allocated memory. Graham replied that he was using a binary tree where each node was a structure that consisted of pointers to the children nodes (if any), the address of  the allocated memory, the size allocated, and the place in   the program where the allocation request was made. He added that he was using the memory address as the key for   the nodes since there could be no duplicates, and this decision would make it easy to insert and delete records of  allocated memory.

Using a binary tree is often more efficient than simply using an ordered linked list of  items. If an ordered list of  n items exists–and each item is equally likely to be sought-then a successful search uses, on average, about n/2 comparisons to find an item. Inserting into and deleting from an ordered list requires one to examine or move about n/2 items on average as well. Computer science textbooks would describe the performance of  these operations (search, insert, and delete) as being O(n), which roughly means that as the size of  the list doubles, the time to perform these operations also is expected to double.’

Using a binary tree can deliver O(log n) performance for these same operations, although the code may be a bit more complicated to write and maintain. That is, as the size of  the list doubles, the performance of  these operations grows only by a constant  amount. When  processing 1,000,000 items, we expect to examine an average of  20 items, compared to about 500,000 if the items were contained in a list. Using a binary tree is a great choice-if  the keys are distributed evenly in the tree. When the keys are not distributed evenly, the tree becomes distorted and loses those properties that make it a good choice for  searching.

Knowing a bit about trees and how they behave, Gary asked Graham the $64,000 (it    is   logarithmic,  after  all)    question:  “Are you  balancing  the  binary  tree?” Graham’s response was surprising, since he was a very good software developer. “No, why should I do that? It  makes the code a lot more complex.” But the fact that Graham wasn’t balancing the tree was exactly the problem causing the horrible performance of  his code. Can you figure out why? The ma lloc ( )  routine in C allocates memory (from the heap) in  order of  increasing memory addresses. Not only are these addresses not evenly distributed, the order is  exactly the one that leads to right-oriented trees, which behave more like linear lists than binary trees. To see why, consider the two binary trees The (a)   tree was created by inserting the numbers 1-15 in  order. Its root node contains the value 1 and there is a path of  14 nodes to reach the node containing the value 15. The (b) tree was created by inserting these same numbers in  the order <8, 4, 12, 2, 6, 10, 14, 1, 3, 5, 7, 9, 11, 13, 15>. In this case, the root node contains the value 8 but the paths to all  other nodes in  the tree are three nodes or less., the search time is directly affected by the length of  the maximum path.

Algorithms to the Rescue

A  balanced binary tree is a binary search tree for which the length of  all paths from the root of   the tree to any leaf node is as close to the same number  as possible. Let’s define depth( Li) to be the length of  the path from the root of  the tree to a leaf node Li· In  a perfectly balanced binary tree with n nodes, for  any two leaf  nodes,  L 1  and  L2 , the  absolute  value of   the  difference,  ldepth( L2)-depth (L1) 1:<;:1; also depth( Li):<;;log( n) for  any leaf node Li.’ Gary went to one of  his algo­rithms books and decided to modify Graham’s code so that the tree of  allocation records would be balanced by making it  a red-black binary tree. Red-black trees (Carmen et al., 2001) are an efficient implementation of  a balanced binary tree in which   given   any   two   leaf     nodes   L1   and   L2 ,  depth(LJ)!depth(L1):<;; 2 ;    also depth( ):<;;2’log2 ( n+1) for  any leaf  node L,. In  other words, a red-black tree is roughly balanced, to ensure that no path is more than twice as long as any other path.

The changes took a few hours to write and test. When he was done, Gary showed Graham  the  result.  They ran  each  of   the  three programs  shown  previously.

Program A and ProgramC took just a few milliseconds longer than ProgramB. The performance improvement reflected approximately a 5,000-fold speedup. This is what might be expected when you consider that the average number of  n odes to visit drops from 500,000 to 20. Actually, this is an order of  magnitude off: you might expect a 25,000-fold speedup, but that is offset by the computation over­ head of  balancing the tree. Still, the results are dramatic, and Graham’s memory leak detector could be released (with Gary’s modifications) in the next version of the product.

Side Story

Given the efficiency of  using red-black binary trees, is it possible that the malloc() implementation itself is coded to use them? After all, the memory allocation func­ tionality must somehow maintain the set  of  allocated regions so they can be safely deallocated. Also, note that each of  the programs listed previously make alloca­ tion requests for 32 bytes. Does the size of  the request affect the performan ce  of malloc() and free()  requests? To investigate the behavior of   malloc(), we ran a set of  experiments. First, we timed how long it  took to allocate 4,096 chunks of  n bytes, with n ranging from 1 to 2,048. Then, we timed how long it took to deallo­ cate the same memory using three strategies:

freeUp

In  the order in which it was allocated; this is identical to ProgramC

freeD own

In  the reverse order in which it was allocated

freeScattered

In  a scattered order that ultimately frees all memory

For each value of  n we ran the experiment 100 times and discarded the best and worst performing runs. Figure 1-3 contains the average results of  the remaining 98 trials. As one might expect,  the performance of   the allocation follows a linear trend-as the size of   n increases, so does the performance, proportional to n. Surprisingly, the way in which the memory is deallocated changes the perfor­ mance. freeUp has the  best performance,  for   example,  while freeDown executes about four times as slowly.

The empirical evidence does not answer whether malloc() and free()  use binary trees (balanced or not!) to store information; without inspecting the source for free ( ) , there is no easy explanation for the different performance based upon the order in which the memory is deallocated.

The Moral of the Story

The  previous  story  really happened.  Algorithms  do  matter.  You  might  ask whether the tree-balancing algorithm was the optimal solution for   the problem. That’s a great question, and one that we’ll answer by asking another question: does it  really matter? Finding the right algorithm is like finding the right solution to any problem. Instead of  finding the perfect solution, the algorithm just has to work well enough. You must balance the cost of  the solution against the value it adds. It’s quite possible that Gary’s implementation could be improved, either by optimizing his implementation or by using a different algorithm. However , the performance of  the memory leak detection software was more than acceptable for the intended use, and any additional improvements would have been unproduc­tive overhead.

The ability to choose an acceptable algorithm for  your needs is a critical skill that any good software developer should have. You don’t  necessarily have to be able to perform detailed mathematical analysis on the algorithm, but you must be able to understand someone else’s analysis. You don’t have to invent new algorithms, but you do need to understand which algorithms fit  the problem at hand.

References

“Algorithms in a Nutshell by George T. Heineman, Gary Pollice, and Stanley Selkow.

Cormen, Thomas H., Charles E. Leiserson, Ronald L. Rivest, and Clifford St ein,

Introduction to Algorithms, Second Edition. McGraw-Hill, 2001.