Blog Archives

How to operate on tree data structures using multiple parallel threads of execution. Parallel algorithms allows for achieving higher performance on multi-core machines, but getting the correct result from parallel execution can be tricky. This set of slides illustrates a few primary algorithms that operate on tree data structures in parallel.

Generating Random Numbers

To be precise, random number generation functions, including rand, return pseudo-random numbers as opposed to truly random numbers, so whenever I say random, I actually mean pseudo-random. Before using the rand function you need to seed (i.e., initialize) the random number generator with a call to srand. This assures that subsequent calls to rand won’t produce the same sequence of numbers each time the program is run. The simplest way to seed the random number generator is to pass the result from a call to clock from the header as an unsigned int. Reseeding a random number generator causes number generation to be less random.
The rand function is limited in many ways. To begin with, it only generates integers, and only does so using a uniform distribution. Furthermore, the specific random number generation algorithm used is implementation specific and, thus, random number sequences are not reproducible from system to system given the same seed. This is a problem for certain kinds of applications, as well as when testing and debugging.

If you want to generate some random floating-point numbers in the interval of [0.0, 1.0) with a uniform distribution. The C++ standard provides the C runtime function rand in the header that returns a random number in the range of 0 to RAND_MAX inclusive. The RAND_MAX macro represents the highest value returnable by the rand function. A demonstration of
using rand to generate random floating-point numbers below.


#include <cstdlib>
#include <ctime>
#include <iostream>
using namespace std;
double doubleRand( )

{
return double(rand( )) / (double(RAND_MAX) + 1.0);
}
int main( )

{
srand(static_cast(clock( )));
cout << "expect 5 numbers within the interval [0.0, 1.0)" << endl;
for (int i=0; i < 5; i++)

{
cout << doubleRand( ) << "\n";
}
cout << endl;
}

The program above should produce output similar to:
expect 5 numbers within the interval [0.0, 1.0)
0.010437
0.740997
0.34906
0.369293
0.544373

A much more sophisticated alternative to rand is the Boost Random library by Jens Maurer. The Boost Random library provides several high-quality random number generation functions for both integer and floating-point types, and support for numerous kinds
of distributions. Below code demonstrates how you can produce random floating- point numbers in the interval [0,1).


#include <boost/random.hpp>
#include <iostream>
#include <cstdlib>
using namespace std;
using namespace boost;
typedef boost::mt19937 BaseGenerator;
typedef boost::uniform_real Distribution;
typedef boost::variate_generator<basegenerator, distribution=""> Generator;
double boostDoubleRand( )

{
static BaseGenerator base;
static Distribution dist;
static Generator rng(base, dist);
return rng( );
}

int main( )

{
cout << "expect 5 numbers within the interval [0,1)" << endl;
for (int i=0; i < 5; i++)

{
cout << boostDoubleRand( ) << "\n";
}
cout << endl;
}

The main advantage of the Boost Random library, is that the pseudo-random number generation algorithm has guaranteed and reproducible randomness properties based on the precise algorithm chosen. In above code I use the Mersenne Twister generator (mt19937) because it offers a good blend of performance and randomness.

Performing Arithmetic on Bitsets in C++

The bitset class template comes with basic operations for manipulating sets of bits but doesn’t provide any arithmetic or comparison operations. This is because the library can’t safely assume what kind of numerical type a programmer might expect an arbitrary set of bits to represent.The functions below treat a bitset as a representation of an unsigned integer type, and provide you with functions for adding, subtracting, multiplying, dividing, and comparing them. These functions can provide a basis for writing custom-sized integer types. I did not use the most efficient algorithms I could for below function. Instead I chose the simplest possible algorithms because they are more easily understood. A much more efficient implementation would use similar algorithms, but would operate on words rather than single bits.

The functions below provide functions that allow arithmetic and comparison of bitset class template from the header as if it represents an unsigned integer.


#include <stdexcept>
#include <bitset>
bool fullAdder(bool b1, bool b2, bool& carry)

{
bool sum = (b1 ^ b2) ^ carry;
carry = (b1 && b2) || (b1 && carry) || (b2 && carry);
return sum;
}
bool fullSubtractor(bool b1, bool b2, bool& borrow)

{
bool diff;
if (borrow)

{
diff = !(b1 ^ b2);
borrow = !b1 || (b1 && b2);
}
else

{
diff = b1 ^ b2;
borrow = !b1 && b2;
}
return diff;
}
template<unsigned int N>
bool bitsetLtEq(const std::bitset& x, const std::bitset& y)
{
for (int i=N-1; i >= 0; i--)

{
if (x[i] && !y[i]) return false;
if (!x[i] && y[i]) return true;
}
return true;
}

template<unsigned int N>
bool bitsetLt(const std::bitset& x, const std::bitset& y)
{
for (int i=N-1; i >= 0; i--) {
if (x[i] && !y[i]) return false;
if (!x[i] && y[i]) return true;
}
return false;
}
template<unsigned int N>
bool bitsetGtEq(const std::bitset& x, const std::bitset& y)
{
for (int i=N-1; i >= 0; i--) {
if (x[i] && !y[i]) return true;
if (!x[i] && y[i]) return false;
}
return true;
}
template<unsigned int N>
bool bitsetGt(const std::bitset& x, const std::bitset& y)
{
for (int i=N-1; i >= 0; i--)

{
if (x[i] && !y[i]) return true;
if (!x[i] && y[i]) return false;
}
return false;
}
template<unsigned int N>
void bitsetAdd(std::bitset& x, const std::bitset& y)
{
bool carry = false;
for (int i = 0; i < N; i++)

{
x[i] = fullAdder(x[i], y[i], carry);
}
}
template<unsigned int N>
void bitsetSubtract(std::bitset& x, const std::bitset& y)

{
bool borrow = false;
for (int i = 0; i < N; i++)

{
if (borrow)

{
if (x[i])

{
x[i] = y[i];
borrow = y[i];
}
else

{
x[i] = !y[i];
borrow = true;
}

}
else

{
if (x[i])

{
x[i] = !y[i];
borrow = false;
}
else

{
x[i] = y[i];
borrow = y[i];
}
}
}
}
template<unsigned int N>
void bitsetMultiply(std::bitset& x, const std::bitset& y)
{
std::bitset tmp = x;
x.reset( );
// we want to minimize the number of times we shift and add
if (tmp.count( ) < y.count( ))

{
for (int i=0; i < N; i++)
if (tmp[i]) bitsetAdd(x, y << i);
}
else

{
for (int i=0; i < N; i++)
if (y[i]) bitsetAdd(x, tmp << i);
}
}
template<unsigned int N>
void bitsetDivide(std::bitset x, std::bitset y,
std::bitset<N>& q, std::bitset<N>& r)
{
if (y.none( ))

{
throw std::domain_error("division by zero undefined");
}
q.reset( );
r.reset( );
if (x.none( ))

{
return;
}
if (x == y)

{
q[0] = 1;
return;
}
r = x;
if (bitsetLt(x, y))

{
return;
}

// count significant digits in divisor and dividend
unsigned int sig_x;
for (int i=N-1; i>=0; i--)

{
sig_x = i;
if (x[i]) break;
}
unsigned int sig_y;
for (int i=N-1; i>=0; i--)

{
sig_y = i;
if (y[i]) break;
}
// align the divisor with the dividend
unsigned int n = (sig_x - sig_y);
y <<= n;
// make sure the loop executes the right number of times
n += 1;
// long division algorithm, shift, and subtract
while (n--)
{
// shift the quotient to the left
if (bitsetLtEq(y, r))
{
// add a new digit to quotient
q[n] = true;
bitsetSubtract(r, y);
}
// shift the divisor to the right
y >>= 1;
}
}

Using the bitset_arithmetic.hpp functions:

#include "bitset_arithmetic.hpp"
#include <bitset>
#include <iostream>
#include <string>
using namespace std;
int main( )

{
bitset<10> bits1(string("100010001"));
bitset<10> bits2(string("000000011"));
bitsetAdd(bits1, bits2);
cout << bits1.to_string<char, char_traits<char="">, allocator >( ) << endl;
}

The program produces the following output:
0100010100

 

Minimum Spanning Tree Algorithms

Given an undirected, connected graph G=(V,E), one might be concerned with finding a subset ST of edges from E that “span” the graph by ensuring that the graph remains connected. If we further require that the total weights of the edges in ST are minimized, then we are interested in finding a minimum spanning tree (MST). PRIM’S ALGORITHM,  shows how to construct an MST from such a graph by using a greedy approach in which each step of the algorithm makes forward progress toward a solution without reversing earlier decisions. PRIM’S ALGORITHM grows a spanning tree T one edge at a time until an MST results (and the resulting spanning tree is provably minimum).

It randomly selects a start vertex s∈V to belong to a growing set S, and it ensures that T forms a tree of edges rooted at s. PRIM’S ALGORITHM is greedy in that it incrementally adds edges to T until an MST is computed. The intuition behind the algorithm is that the edge (u,v) with lowest weight between u∈S and v∈V–S must belong to the MST. When such an edge (u,v) with lowest weight is found, it is added to T and the vertex v is added to S.

The algorithm uses a priority queue to store the vertices v∈V–S with an associated priority equal to the lowest weight of some edge (u,v) where u∈S. This carefully designed approach ensures the efficiency of the resulting implementation.

Solution
The C++ solution below relies on a binary heap to provide the implementation of the priority queue that is central to PRIM’S ALGORITHM. Ordinarily, using a binary heap would be inefficient because of the check in the main loop for whether a particular vertex is a member of the priority queue (an operation not supported by binary heaps). However, the algorithm ensures that vertices
are only removed from the priority queue as it processes, so we need only maintain a status array inQueue[] that is updated whenever a vertex is extracted from the priority queue.

In another implementation optimization, we maintain an external array key[] that records the current priority key for each vertex in the queue, which again eliminates the need to search the priority queue for a given vertex identifier.

/**
* Prim’s Algorithm implementation with binary heap
*
* Given undirected graph, compute MST starting from a randomly
* selected vertex. Encoding of MST is done using 'pred' entries.
*/

void mst_prim (Graph const &graph, vector &pred)
{
// initialize pred[] and key[] arrays. Start with arbitrary
// vertex s=0. Priority Queue PQ contains all v in G.
const int n = graph.numVertices( );
pred.assign(n, -1);
vector<int> key(n, numeric_limits<int>::max( ));
key[0] = 0;
BinaryHeap pq(n);
vector inQueue(n, true);

for (int v = 0; v < n; v++)
{
pq.insert(v, key[v]);
}

while (!pq.isEmpty( ))
{
int u = pq.smallest( );
inQueue[u] = false;

// Process all neighbors of u to find if any edge beats best distance

for (VertexList::const_iterator ci = graph.begin(u); ci != graph.end(u); ++ci)
{
int v = ci->first;

if (inQueue[v])
{
int w = ci->second;
if (w < key[v])
{
pred[v] = u;
key[v] = w;
pq.decreaseKey(v, w);
}
}
}
}
}

Consequences
For dense graphs, the priority queue can be implemented instead with a Fibonacci heap. This improves the performance to O(E+V*log V), a significant speedup over the binary heap implementation.

Analysis
The initialization phase of PRIM’S ALGORITHM inserts each vertex into the priority queue (implemented by a binary heap) for a total cost of O(V log V). The decreaseKey operation in PRIM’S ALGORITHM requires O(log q) performance, where q is the number of elements in the queue, which will always be less than |V|. It can be called at most 2*|E| times since each vertex is removed once from
the priority queue and each undirected edge in the graph is visited exactly twice. Thus the total performance is O((V+2*E)*log n) or O((V+E)*log V).

Single-Source Shortest Path (Dijkstra’s Algorithm Implementation in C++)

Suppose you want to fly  a private plane on the shortest path from Saint Johns­ bury, VT to Waco, TX. Assume you know the distances between the airports for all pairs of  cities and towns that are reachable from each other in  one nonstop flight of  your plane. The best-known algorithm to solve this problem, Dijkstra’s Algorithm, finds the shortest path from Saint Johns­ bury to  all other airports, although the search may be halted once the shortest path  to Waco is known.

Dijkstra’s Algorithm conceptually operates in greedy fashion by expanding a set of  vertices, S, for  which the shortest path from s to every vertex VE S is known, but only using paths that include vertices in S. Initially, S  equals the set {s}. To expand S, Dijkstra’s Algorithm  finds the vertex VE V-S whose distance to s is  smallest, and follows v’s edges to see whether a shorter path exists to another  vertex. After processing v2 , for example, the algorithm determines that the distance from s to v3 is  really 17 through the path <s,v2 ,v 3>. Once S  expands to equal V, the algorithm completes.

Input/Output

Input

A  directed, weighted graph G=(V,E) and a source vertex sE V. Each edge e=(u,v) has an associated positive weight in the graph. The quantity n represents the number of  vertices in  G.

Output

Dijkstra’s Algorithm  produces two computed  arrays. The primary result is the array dist[] of  values representing the distance from source vertex s to each vertex in the graph. Note that d ist[ s] is zero. The secondary result is  the array p red [), which can be used to rediscover the actual shortest paths from vertex s to each vertex in the graph.

Assumptions

The edge weights are positive (i.e., greater than zero); if  this assumption is  not true, then dist[u]  may contain  invalid results. Even worse, Dijkstra’s Algorithm will loop forever if a cycle exists whose sum of  all weights is less than zero.

Solution

As  Dijkstra’s Algorithm executes, dis t[v]  represents the maximum length of the shortest path found from the source s to v using only vertices visited within the setS. Also, for  each vES, dist[v] is correct. Fortunately, Dijkstra’s Algorithm does not actually compute and store the setS. It  initially constructs a set  containing the vertices in V, and then it  removes vertices one at a time from the set to compute proper dis t[v] values; for  convenience, we continue to refer to this ever-shrinking set as V-S. Dijkstra’s Algorithm terminates when all vertices are either visited or are shown to not be reachable from the source vertex s.

In the C++ solution shown below, a binary heap stores the vertices in  the set V-S  as a priority queue because, in constant time, one can locate the vertex with smallest priority (where the priority is determined by the vertex’s distance from s). Additionally, when  a shorter  path  from s   to v is found, dist [ v ]   is decreased, requiring the heap to be modified. Fortunately, the decrease Key  opera­tion on  priority queues represented  using binary heaps can be performed  on average in O(log q)  time, where q is the number of verticesin the binary heap, which will always be less than or equal to the number of vertices, n.

//Dijkstra’s Algorithm with priority queue implementation

#include "BinaryHeap.h"
#include "Graph.h"

/** Given directed, weighted graph, compute shortest distance to vertices
* (dist) and record predecessor links (pred) for all vertices. */

void singleSourceShortest(Graph const &g, int s, vector &dist, vector &pred)
{
// initialize dist[] and pred[] arrays. Start with vertex s by setting
// dist[] to 0. Priority Queue PQ contains all v in G.

const int n = g.numVertices( );
pred.assign(n, -1);
dist.assign(n, numeric_limits<int>::max( ));
dist[s] = 0;
BinaryHeap pq(n);

for (int u = 0; u < n; u++)
{
pq.insert (u, dist[u]);
}
// find vertex in ever-shrinking set, V-S, whose dist[] is smallest.
// Recompute potential new paths to update all shortest paths

while (!pq.isEmpty( ))
{
int u = pq.smallest( );
// For neighbors of u, see if newLen (best path from s->u + weight
// of edge u->v) is better than best path from s->v. If so, update
// in dist[v] and re-adjust binary heap accordingly. Compute in
// long to avoid overflow error.

for (VertexList::const_iterator ci = g.begin(u); ci != g.end(u); ++ci)
{
int v = ci->first;
long newLen = dist[u];
newLen += ci->second;

if (newLen < dist[v])
{
pq.decreaseKey (v, newLen);
dist[v] = newLen;
pred[v] = u;
}
}
}
}

Consequences

Arithmetic error also may occur if  the sum of  the individual edge weights exceeds numeric_limits<i n t>: :max () (although the individual values do  not). To avoid this situation, the computed new len uses a long data type.

Analysis

In the implementation of  Dijkstra’s Algorithm, the loop that constructs  the  initial priority  queue  performs  the  insert  operation  V  times, resulting in performance  O(V log V). In   the remaining while loop, each edge is visited once, and thus decrease Key  is called no more than E  times, which contrib­utes O(E log V)  time. Thus, the overall performance is O(( V +E) log V).

The   C++ implementation below is simpler since it avoids the use of  a binary heap. Th e efficiency of   this version is determined by considering how fast the smallest dist [] value in V-S can be retrieved. The while loop is executed n times, since S grows on e vertex at a time. Finding the smallest dist [ u]  in V-S inspects all n vertices. Note that each edge is inspected exactly once in the inner loop within the while loop. Thus, the total running time of  this version is 0 (V 2+E).

//Implementation of Dijkstra’s Algorithm for dense graphs

#include "Graph.h"

void singleSourceShortest(Graph const &graph, int s, vector &dist, vector &pred)
{

// initialize dist[] and pred[] arrays. Start with vertex s by setting
// dist[] to 0.

const int n = graph.numVertices( );
pred.assign(n, -1);
dist.assign(n, numeric_limits<int>::max( ));
vector<bool> visited(n);
dist[s] = 0;

// find vertex in ever-shrinking set, V-S, whose dist value is smallest
// Recompute potential new paths to update all shortest paths

while (true)
{
// find shortest distance so far in unvisited vertices
int u = -1;
int sd = numeric_limits<int>::max( ); // assume not reachable

for (int i = 0; i < n; i++)
{
if (!visited[i] && dist[i] < sd)
{
sd = dist[i];
u = i;
}
}
if (u == -1)
{
break; // no more progress to be made
}

// For neighbors of u, see if length of best path from s->u + weight
// of edge u->v is better than best path from s->v.
visited[u] = true;

for (VertexList::const_iterator ci = graph.begin(u); ci != graph.end(u); ++ci)
{
int v = ci->first; // the neighbor v
long newLen = dist[u]; // compute as long
newLen += ci->second; // sum with (u,v) weight

if (newLen < dist[v])
{
dist[v] = newLen;
pred[v] = u;
}
}
}
}

We can further optimize to remove all  of  the C++ standard template library  objects,  as  shown  below.  By reducing the  overhead of   the supporting classes, we realize impressive performance benefits, as discussed in the “Comparison” section.

/**
* Optimized Dijkstra’s Algorithm for dense graphs
*
* Given int[][] of edge weights in raw form, compute shortest distance to
* all vertices in graph (dist) and record predecessor links for all
* vertices (pred) to be able to recreate these paths. An edge weight of
* INF means no edge. Suitable for Dense Graphs Only.
*/

void singleSourceShortestDense(int n, int ** const weight, int s,int *dist, int *pred)
{
// initialize dist[] and pred[] arrays. Start with vertex s by setting
// dist[] to 0. All vertices are unvisited.
bool *visited = new bool[n];
for (int v = 0; v < n; v++)
{
dist[v] = numeric_limits<int>::max( );
pred[v] = -1;
visited[v] = false;
}

dist[s] = 0;

// find shortest distance from s to all unvisited vertices. Recompute
// potential new paths to update all shortest paths. Exit if u remains -1.
while (true)
{
int u = -1;
int sd = numeric_limits<int>::max( );
for (int i = 0; i < n; i++)
{
if (!visited[i] && dist[i] < sd)
{
sd = dist[i];
u = i;
}
}
if (u == -1)
{
break;
}
// For neighbors of u, see if length of best path from s->u + weight
// of edge u->v is better than best path from s->v. Compute using longs.
visited[u] = true;
for (int v = 0; v < n; v++)
{
int w = weight[u][v];
if (v == u) continue;
long newLen = dist[u];
newLen += w;
if (newLen < dist[v])
{
dist[v] = newLen;
pred[v] = u;
}
}
}
delete [] visited;
}

Basics of Algorithm Complexity Analysis

 

A lot of programmers that make some of the coolest and most useful software today, such as many of the stuff we see on the Internet or use daily, don’t have a theoretical computer science background. They’re still pretty awesome and creative programmers and we thank them for what they build.

However, theoretical computer science has its uses and applications and can turn out to be quite practical. In this article, targeted at programmers who know their art but who don’t have any theoretical computer science background, I will present one of the most pragmatic tools of computer science: Big O notation and algorithm complexity analysis. As someone who has worked both in a computer science academic setting and in building production-level software in the industry, this is the tool I have found to be one of the truly useful ones in practice, so I hope after reading this article you can apply it in your own code to make it better. After reading this post, you should be able to understand all the common terms computer scientists use such as “big O”, “asymptotic behavior” and “worst-case analysis”.

I believe this text will be helpful for industry programmers who don’t have too much experience with theoretical computer science (it is a fact that some of the most inspiring software engineers never went to college). But because it’s also for students, it may at times sound a little bit like a textbook. In addition, some of the topics in this text may seem too obvious to you; for example, you may have seen them during your high school years. If you feel you understand them, you can skip them. Other sections go into a bit more depth and become slightly theoretical. But these things are still good to know and not tremendously hard to follow, so it’s likely well worth your time. As the original text was targeted at high school students, no mathematical background is required, so anyone with some programming experience (i.e. if you know what recursion is) will be able to follow through without any problem.

Big O notation and algorithm complexity analysis is something a lot of industry programmers and junior students alike find hard to understand, fear, or avoid altogether as useless. But it’s not as hard or as theoretical as it may seem at first. Algorithm complexity is just a way to formally measure how fast a program or algorithm runs, so it really is quite pragmatic. Let’s start by motivating the topic a little bit.

figure 1

Figure 1: Artificial intelligence characters in video games use algorithms to avoid obstacles when navigating in the virtual world

Motivation

We already know there are tools to measure how fast a program runs. There are programs called profilers which measure running time in milliseconds and can help us optimize our code by spotting bottlenecks. While this is a useful tool, it isn’t really relevant to algorithm complexity. Algorithm complexity is something designed to compare two algorithms at the idea level — ignoring low-level details such as the implementation programming language, the hardware the algorithm runs on, or the instruction set of the given CPU. We want to compare algorithms in terms of just what they are: Ideas of how something is computed. Counting milliseconds won’t help us in that. It’s quite possible that a bad algorithm written in a low-level programming language such as assembly runs much quicker than a good algorithm written in a high-level programming language such as Python or Ruby. So it’s time to define what a “better algorithm” really is.

As algorithms are programs that perform just a computation, and not other things computers often do such as networking tasks or user input and output, complexity analysis allows us to measure how fast a program is when it performs computations. Examples of operations that are purely computational include numerical floating-point operations such as addition and multiplication; searching within a database that fits in RAM for a given value; determining the path an artificial-intelligence character will walk through in a video game so that they only have to walk a short distance within their virtual world (see Figure 1); or running a regular expression pattern match on a string. Clearly, computation is ubiquitous in computer programs.

Complexity analysis is also a tool that allows us to explain how an algorithm behaves as the input grows larger. If we feed it a different input, how will the algorithm behave? If our algorithm takes 1 second to run for an input of size 1000, how will it behave if I double the input size? Will it run just as fast, half as fast, or four times slower? In practical programming, this is important as it allows us to predict how our algorithm will behave when the input data becomes larger. For example, if we’ve made an algorithm for a web application that works well with 1000 users and measure its running time, using algorithm complexity analysis we can have a pretty good idea of what will happen once we get 2000 users instead. For algorithmic competitions, complexity analysis gives us insight about how long our code will run for the largest testcases that are used to test our program’s correctness. So if we’ve measured our program’s behavior for a small input, we can get a good idea of how it will behave for larger inputs. Let’s start by a simple example: Finding the maximum element in an array.

Counting instructions

In this article, I’ll use various programming languages for the examples. However, don’t despair if you don’t know a particular programming language. Since you know programming, you should be able to read the examples without any problem even if you aren’t familiar with the programming language of choice, as they will be simple and I won’t use any esoteric language features. If you’re a student competing in algorithms competitions, you most likely work with C++, so you should have no problem following through. In that case I recommend working on the exercises using C++ for practice.

The maximum element in an array can be looked up using a simple piece of code such as this piece of Javascript code. Given an input array A of size n:

var M = A[ 0 ];

for ( var i = 0; i < n; ++i ) {

if ( A[ i ] > M ) {

M = A[ i ];

}

}

Now, the first thing we’ll do is count how many fundamental instructions this piece of code executes. We will only do this once and it won’t be necessary as we develop our theory, so bear with me for a few moments as we do this. As we analyze this piece of code, we want to break it up into simple instructions; things that can be executed by the CPU directly – or close to that. We’ll assume our processor can execute the following operations as one instruction each:

  • Assigning a value to a variable
  • Looking up the value of a particular element in an array
  • Comparing two values
  • Incrementing a value
  • Basic arithmetic operations such as addition and multiplication

We’ll assume branching (the choice between if and else parts of code after the if condition has been evaluated) occurs instantly and won’t count these instructions. In the above code, the first line of code is:

var M = A[ 0 ];

This requires 2 instructions: One for looking up A[ 0 ] and one for assigning the value to M (we’re assuming that n is always at least 1). These two instructions are always required by the algorithm, regardless of the value of n. The for loop initialization code also has to always run. This gives us two more instructions; an assignment and a comparison:

i = 0;

i < n;

These will run before the first for loop iteration. After each for loop iteration, we need two more instructions to run, an increment of i and a comparison to check if we’ll stay in the loop:

++i;

i < n;

So, if we ignore the loop body, the number of instructions this algorithm needs is 4 + 2n. That is, 4 instructions at the beginning of the for loop and 2 instructions at the end of each iteration of which we have n. We can now define a mathematical function f( n ) that, given an n, gives us the number of instructions the algorithm needs. For an empty for body, we have f( n ) = 4 + 2n.

Worst-case analysis

Now, looking at the for body, we have an array lookup operation and a comparison that happen always:

if ( A[ i ] > M ) { …

That’s two instructions right there. But the if body may run or may not run, depending on what the array values actually are. If it happens to be so that A[ i ] > M, then we’ll run these two additional instructions — an array lookup and an assignment:

M = A[ i ]

But now we can’t define an f( n ) as easily, because our number of instructions doesn’t depend solely on n but also on our input. For example, for A = [ 1, 2, 3, 4 ] the algorithm will need more instructions than for A = [ 4, 3, 2, 1 ]. When analyzing algorithms, we often consider the worst-case scenario. What’s the worst that can happen for our algorithm? When does our algorithm need the most instructions to complete? In this case, it is when we have an array in increasing order such as A = [ 1, 2, 3, 4 ]. In that case, M needs to be replaced every single time and so that yields the most instructions. Computer scientists have a fancy name for that and they call it worst-case analysis; that’s nothing more than just considering the case when we’re the most unlucky. So, in the worst case, we have 4 instructions to run within the for body, so we have f( n ) = 4 + 2n + 4n = 6n + 4. This function f, given a problem size n, gives us the number of instructions that would be needed in the worst-case.

Asymptotic behavior

Given such a function, we have a pretty good idea of how fast an algorithm is. However, as I promised, we won’t be needing to go through the tedious task of counting instructions in our program. Besides, the number of actual CPU instructions needed for each programming language statement depends on the compiler of our programming language and on the available CPU instruction set (i.e. whether it’s an AMD or an Intel Pentium on your PC, or a MIPS processor on your Playstation 2) and we said we’d be ignoring that. We’ll now run our “f” function through a “filter” which will help us get rid of those minor details that computer scientists prefer to ignore.

In our function, 6n + 4, we have two terms: 6n and 4. In complexity analysis we only care about what happens to the instruction-counting function as the program input (n) grows large. This really goes along with the previous ideas of “worst-case scenario” behavior: We’re interested in how our algorithm behaves when treated badly; when it’s challenged to do something hard. Notice that this is really useful when comparing algorithms. If an algorithm beats another algorithm for a large input, it’s most probably true that the faster algorithm remains faster when given an easier, smaller input. From the terms that we are considering, we’ll drop all the terms that grow slowly and only keep the ones that grow fast as n becomes larger. Clearly 4 remains a 4 as n grows larger, but 6n grows larger and larger, so it tends to matter more and more for larger problems. Therefore, the first thing we will do is drop the 4 and keep the function as f( n ) = 6n.

This makes sense if you think about it, as the 4 is simply an “initialization constant”. Different programming languages may require a different time to set up. For example, Java needs some time to initialize its virtual machine. Since we’re ignoring programming language differences, it only makes sense to ignore this value.

The second thing we’ll ignore is the constant multiplier in front of n, and so our function will become f( n ) = n. As you can see this simplifies things quite a lot. Again, it makes some sense to drop this multiplicative constant if we think about how different programming languages compile. The “array lookup” statement in one language may compile to different instructions in different programming languages. For example, in C, doing A[ i ] does not include a check that i is within the declared array size, while in Pascal it does. So, the following Pascal code:

M := A[ i ]

Is the equivalent of the following in C:

if ( i >= 0 && i < n ) {

M = A[ i ];

}

So it’s reasonable to expect that different programming languages will yield different factors when we count their instructions. In our example in which we are using a dumb compiler for Pascal that is oblivious of possible optimizations, Pascal requires 3 instructions for each array access instead of the 1 instruction C requires. Dropping this factor goes along the lines of ignoring the differences between particular programming languages and compilers and only analyzing the idea of the algorithm itself.

This filter of “dropping all factors” and of “keeping the largest growing term” as described above is what we call asymptotic behavior. So the asymptotic behavior of f( n ) = 2n + 8 is described by the function f( n ) = n. Mathematically speaking, what we’re saying here is that we’re interested in the limit of function f as n tends to infinity; but if you don’t understand what that phrase formally means, don’t worry, because this is all you need to know. (On a side note, in a strict mathematical setting, we would not be able to drop the constants in the limit; but for computer science purposes, we want to do that for the reasons described above.) Let’s work a couple of examples to familiarize ourselves with the concept.

Figure 2: The n3 function, drawn in blue, becomes larger than the 1999n function, drawn in red, after n = 45. After that point it remains larger for ever.

Let us find the asymptotic behavior of the following example functions by dropping the constant factors and by keeping the terms that grow the fastest.

  1. f( n ) = 5n + 12 gives f( n ) = n.

By using the exact same reasoning as above.

  1. f( n ) = 109 gives f( n ) = 1.

We’re dropping the multiplier 109 * 1, but we still have to put a 1 here to indicate that this function has a non-zero value.

  1. f( n ) = n2 + 3n + 112 gives f( n ) = n2

Here, n2 grows larger than 3n for sufficiently large n, so we’re keeping that.

  1. f( n ) = n3 + 1999n + 1337 gives f( n ) = n3

Even though the factor in front of n is quite large, we can still find a large enough n so that n3 is bigger than 1999n. As we’re interested in the behavior for very large values of n, we only keep n3 (See Figure 2).

  1. f( n ) = n + gives f( n ) = n

This is so because n grows faster than as we increase n.

Complexity

So what this is telling us is that since we can drop all these decorative constants, it’s pretty easy to tell the asymptotic behavior of the instruction-counting function of a program. In fact, any program that doesn’t have any loops will have f( n ) = 1, since the number of instructions it needs is just a constant (unless it uses recursion; see below). Any program with a single loop which goes from 1 to n will have f( n ) = n, since it will do a constant number of instructions before the loop, a constant number of instructions after the loop, and a constant number of instructions within the loop which all run n times.

This should now be much easier and less tedious than counting individual instructions, so let’s take a look at a couple of examples to get familiar with this. The following PHP program checks to see if a particular value exists within an array A of size n:

<?php

$exists = false;

for ( $i = 0; $i < n; ++$i ) {

if ( $A[ $i ] == $value ) {

$exists = true;

break;

}

}

?>

This method of searching for a value within an array is called linear search. This is a reasonable name, as this program has f( n ) = n (we’ll define exactly what “linear” means in the next section). You may notice that there’s a “break” statement here that may make the program terminate sooner, even after a single iteration. But recall that we’re interested in the worst-case scenario, which for this program is for the array A to not contain the value. So we still have f( n ) = n.

Let’s look at a Python program which adds two array elements together to produce a sum which it stores in another variable:

v = a[ 0 ] + a[ 1 ]

Here we have a constant number of instructions, so we have f( n ) = 1.

The following program in C++ checks to see if a vector (a fancy array) named A of size n contains the same two values anywhere within it:

bool duplicate = false;

for ( int i = 0; i < n; ++i ) {

for ( int j = 0; j < n; ++j ) {

if ( i != j && A[ i ] == A[ j ] ) {

duplicate = true;

break;

}

}

if ( duplicate ) {

break;

}

}

As here we have two nested loops within each other, we’ll have an asymptotic behavior described by f( n ) = n2.

Rule of thumb: Simple programs can be analyzed by counting the nested loops of the program. A single loop over n items yields f( n ) = n. A loop within a loop yields f( n ) = n2. A loop within a loop within a loop yields f( n ) = n3.

If we have a program that calls a function within a loop and we know the number of instructions the called function performs, it’s easy to determine the number of instructions of the whole program. Indeed, let’s take a look at this C example:

int i;

for ( i = 0; i < n; ++i ) {

f( n );

}

If we know that f( n ) is a function that performs exactly n instructions, we can then know that the number of instructions of the whole program is asymptotically n2, as the function is called exactly n times.

Rule of thumb: Given a series of for loops that are sequential, the slowest of them determines the asymptotic behavior of the program. Two nested loops followed by a single loop is asymptotically the same as the nested loops alone, because the nested loops dominate the simple loop.

Now, let’s switch over to the fancy notation that computer scientists use. When we’ve figured out the exact such f asymptotically, we’ll say that our program is Θ( f( n ) ). For example, the above programs are Θ( 1 ), Θ( n2 ) and Θ( n2 ) respectively. Θ( n ) is pronounced “theta of n”. Sometimes we say that f( n ), the original function counting the instructions including the constants, is Θ( something ). For example, we may say that f( n ) = 2n is a function that is Θ( n ) — nothing new here. We can also write 2n ∈ Θ( n ), which is pronounced as “two n is theta of n”. Don’t get confused about this notation: All it’s saying is that if we’ve counted the number of instructions a program needs and those are 2n, then the asymptotic behavior of our algorithm is described by n, which we found by dropping the constants. Given this notation, the following are some true mathematical statements:

  1. n6 + 3n ∈ Θ( n6 )
  2. 2n + 12 ∈ Θ( 2n )
  3. 3n + 2n ∈ Θ( 3n )
  4. nn + n ∈ Θ( nn )

We call this function, i.e. what we put within Θ( here ), the time complexity or just complexity of our algorithm. So an algorithm with Θ( n ) is of complexity n. We also have special names for Θ( 1 ), Θ( n ), Θ( n2 ) and Θ( log( n ) ) because they occur very often. We say that a Θ( 1 ) algorithm is a constant-time algorithm, Θ( n ) is linear, Θ( n2 ) is quadratic and Θ( log( n ) ) is logarithmic (don’t worry if you don’t know what logarithms are yet – we’ll get to that in a minute).

Rule of thumb: Programs with a bigger Θ run slower than programs with a smaller Θ.

figure 3

Figure 3: A player that is located in the yellow dot will not see the shadowed areas. Splitting the world in small fragments and sorting them by their distance to the player is one way to solve the visibility problem.

Big-O notation

Now, it’s sometimes true that it will be hard to figure out exactly the behavior of an algorithm in this fashion as we did above, especially for more complex examples. However, we will be able to say that the behavior of our algorithm will never exceed a certain bound. This will make life easier for us, as we won’t have to specify exactly how fast our algorithm runs, even when ignoring constants the way we did before. All we’ll have to do is find a certain bound. This is explained easily with an example.

A famous problem computer scientists use for teaching algorithms is the sorting problem. In the sorting problem, an array A of size n is given (sounds familiar?) and we are asked to write a program that sorts this array. This problem is interesting because it is a pragmatic problem in real systems. For example, a file explorer needs to sort the files it displays by name so that the user can navigate them with ease. Or, as another example, a video game may need to sort the 3D objects displayed in the world based on their distance from the player’s eye inside the virtual world in order to determine what is visible and what isn’t, something called the Visibility Problem (see Figure 3). The objects that turn out to be closest to the player are those visible, while those that are further may get hidden by the objects in front of them. Sorting is also interesting because there are many algorithms to solve it, some of which are worse than others. It’s also an easy problem to define and to explain. So let’s write a piece of code that sorts an array.

Here is an inefficient way to implement sorting an array in Ruby. (Of course, Ruby supports sorting arrays using build-in functions which you should use instead, and which are certainly faster than what we’ll see here. But this is here for illustration purposes.)

b = []

n.times do

m = a[ 0 ]

mi = 0

a.each_with_index do |element, i|

if element < m

m = element

mi = i

end

end

a.delete_at( mi )

b << m

end

This method is called selection sort. It finds the minimum of our array (the array is denoted a above, while the minimum value is denoted m and mi is its index), puts it at the end of a new array (in our case b), and removes it from the original array. Then it finds the minimum between the remaining values of our original array, appends that to our new array so that it now contains two elements, and removes it from our original array. It continues this process until all items have been removed from the original and have been inserted into the new array, which means that the array has been sorted. In this example, we can see that we have two nested loops. The outer loop runs n times, and the inner loop runs once for each element of the array a. While the array a initially has n items, we remove one array item in each iteration. So the inner loop repeats n times during the first iteration of the outer loop, then n – 1 times, then n – 2 times and so forth, until the last iteration of the outer loop during which it only runs once.

It’s a little harder to evaluate the complexity of this program, as we’d have to figure out the sum 1 + 2 + … + (n – 1) + n. But we can for sure find an “upper bound” for it. That is, we can alter our program (you can do that in your mind, not in the actual code) to make it worse than it is and then find the complexity of that new program that we derived. If we can find the complexity of the worse program that we’ve constructed, then we know that our original program is at most that bad, or maybe better. That way, if we find out a pretty good complexity for our altered program, which is worse than our original, we can know that our original program will have a pretty good complexity too – either as good as our altered program or even better.

Let’s now think of the way to edit this example program to make it easier to figure out its complexity. But let’s keep in mind that we can only make it worse, i.e. make it take up more instructions, so that our estimate is meaningful for our original program. Clearly we can alter the inner loop of the program to always repeat exactly n times instead of a varying number of times. Some of these repetitions will be useless, but it will help us analyze the complexity of the resulting algorithm. If we make this simple change, then the new algorithm that we’ve constructed is clearly Θ( n2 ), because we have two nested loops where each repeats exactly n times. If that is so, we say that the original algorithm is O( n2 ). O( n2 ) is pronounced “big oh of n squared”. What this says is that our program is asymptotically no worse than n2. It may even be better than that, or it may be the same as that. By the way, if our program is indeed Θ( n2 ), we can still say that it’s O( n2 ). To help you realize that, imagine altering the original program in a way that doesn’t change it much, but still makes it a little worse, such as adding a meaningless instruction at the beginning of the program. Doing this will alter the instruction-counting function by a simple constant, which is ignored when it comes to asymptotic behavior. So a program that is Θ( n2 ) is also O( n2 ).

But a program that is O( n2 ) may not be Θ( n2 ). For example, any program that is Θ( n ) is also O( n2 ) in addition to being O( n ). If we imagine the that a Θ( n ) program is a simple for loop that repeats n times, we can make it worse by wrapping it in another for loop which repeats n times as well, thus producing a program with f( n ) = n2. To generalize this, any program that is Θ( a ) is O( b ) when b is worse than a. Notice that our alteration to the program doesn’t need to give us a program that is actually meaningful or equivalent to our original program. It only needs to perform more instructions than the original for a given n. All we’re using it for is counting instructions, not actually solving our problem.

So, saying that our program is O( n2 ) is being on the safe side: We’ve analyzed our algorithm, and we’ve found that it’s never worse than n2. But it could be that it’s in fact n2. This gives us a good estimate of how fast our program runs.

Because the O-complexity of an algorithm gives an upper bound for the actual complexity of an algorithm, while Θ gives the actual complexity of an algorithm, we sometimes say that the Θ gives us a tight bound. If we know that we’ve found a complexity bound that is not tight, we can also use a lower-case o to denote that. For example, if an algorithm is Θ( n ), then its tight complexity is n. Then this algorithm is both O( n ) and O( n2 ). As the algorithm is Θ( n ), the O( n ) bound is a tight one. But the O( n2 ) bound is not tight, and so we can write that the algorithm is o( n2 ), which is pronounced “small o of n squared” to illustrate that we know our bound is not tight. It’s better if we can find tight bounds for our algorithms, as these give us more information about how our algorithm behaves, but it’s not always easy to do.

Rule of thumb: It’s easier to figure out the O-complexity of an algorithm than its Θ-complexity.

You may be getting a little overwhelmed with all this new notation by now, but let’s introduce just two more symbols before we move on to a few examples. These are easy now that you know Θ, O and o, and we won’t use them much later in this article, but it’s good to know them now that we’re at it. In the example above, we modified our program to make it worse (i.e. taking more instructions and therefore more time) and created the O notation. O is meaningful because it tells us that our program will never be slower than a specific bound, and so it provides valuable information so that we can argue that our program is good enough. If we do the opposite and modify our program to make it better and find out the complexity of the resulting program, we use the notation Ω. Ω therefore gives us a complexity that we know our program won’t be better than. This is useful if we want to prove that a program runs slowly or an algorithm is a bad one. This can be useful to argue that an algorithm is too slow to use in a particular case. For example, saying that an algorithm is Ω( n3 ) means that the algorithm isn’t better than n3. It might be Θ( n3 ), as bad as Θ( n4 ) or even worse, but we know it’s at least somewhat bad. So Ω gives us a lower bound for the complexity of our algorithm. Similarly to ο, we can write ω if we know that our bound isn’t tight. For example, a Θ( n3 ) algorithm is ο( n4 ) and ω( n2 ). Ω( n ) is pronounced “big omega of n”, while ω( n ) is pronounced “small omega of n”.

The reason we use O and Ω instead of Θ even though O and Ω can also give tight bounds is that we may not be able to tell if a bound we’ve found is tight, or we may just not want to go through the process of scrutinizing it so much.

If you don’t fully remember all the different symbols and their uses, don’t worry about it too much right now. You can always come back and look them up. The most important symbols are O and Θ.

Also note that although Ω gives us a lower-bound behavior for our function (i.e. we’ve improved our program and made it perform less instructions) we’re still referring to a “worst-case” analysis. This is because we’re feeding our program the worst possible input for a given n and analyzing its behavior under this assumption.

The following table indicates the symbols we just introduced and their correspondence with the usual mathematical symbols of comparisons that we use for numbers. The reason we don’t use the usual symbols here and use Greek letters instead is to point out that we’re doing an asymptotic behavior comparison, not just a simple comparison.

Asymptotic comparison operator

Numeric comparison operator

Our algorithm is o( something ) A number is < something
Our algorithm is O( something ) A number is something
Our algorithm is Θ( something ) A number is = something
Our algorithm is Ω( something ) A number is something
Our algorithm is ω( something ) A number is > something

Rule of thumb: While all the symbols O, o, Ω, ω and Θ are useful at times, O is the one used more commonly, as it’s easier to determine than Θ and more practically useful than Ω.

Figure 4: A comparison of the functions n, , and log( n ). Function n, the linear function, drawn in green at the top, grows much faster than the square root function, drawn in red in the middle, which, in turn, grows much faster than the log( n ) function drawn in blue at the bottom of this plot. Even for small n such as n = 100, the difference is quite pronounced.

Logarithms

If you know what logarithms are, feel free to skip this section. As a lot of people are unfamiliar with logarithms, or just haven’t used them much recently and don’t remember them, this section is here as an introduction for them. This text is also for younger students that haven’t seen logarithms at school yet. Logarithms are important because they occur a lot when analyzing complexity. A logarithm is an operation applied to a number that makes it quite smaller – much like a square root of a number. So if there’s one thing you want to remember about logarithms is that they take a number and make it much smaller than the original (See Figure 4). Now, in the same way that square roots are the inverse operation of squaring something, logarithms are the inverse operation of exponentiating something. This isn’t as hard as it sounds. It’s better explained with an example. Consider the equation:

2x = 1024

We now wish to solve this equation for x. So we ask ourselves: What is the number to which we must raise the base 2 so that we get 1024? That number is 10. Indeed, we have 210 = 1024, which is easy to verify. Logarithms help us denote this problem using new notation. In this case, 10 is the logarithm of 1024 and we write this as log( 1024 ) and we read it as “the logarithm of 1024”. Because we’re using 2 as a base, these logarithms are called base 2 logarithms. There are logarithms in other bases, but we’ll only use base 2 logarithms in this article. If you’re a student competing in international competitions and you don’t know about logarithms, I highly recommend that you practice your logarithms after completing this article. In computer science, base 2 logarithms are much more common than any other types of logarithms. This is because we often only have two different entities: 0 and 1. We also tend to cut down one big problem into halves, of which there are always two. So you only need to know about base-2 logarithms to continue with this article.

Rule of thumb: For competition algorithms implemented in C++, once you’ve analyzed your complexity, you can get a rough estimate of how fast your program will run by expecting it to perform about 1,000,000 operations per second, where the operations you count are given by the asymptotic behavior function describing your algorithm. For example, a Θ( n ) algorithm takes about a second to process the input for n = 1,000,000.

factorial( 4 ) -> factorial( 3 ) -> factorial( 2 ) -> factorial( 1 )” border=”0″ height=”374″ width=”184″>

Figure 5: The recursion performed by the factorial function.

Recursive complexity

Let’s now take a look at a recursive function. A recursive function is a function that calls itself. Can we analyze its complexity? The following function, written in Python, evaluates the factorial of a given number. The factorial of a positive integer number is found by multiplying it with all the previous positive integers together. For example, the factorial of 5 is 5 * 4 * 3 * 2 * 1. We denote that “5!” and pronounce it “five factorial” (some people prefer to pronounce it by screaming it out aloud like “FIVE!!!”)

def factorial( n ):

if n == 1:

return 1

return n * factorial( n – 1 )

Let us analyze the complexity of this function. This function doesn’t have any loops in it, but its complexity isn’t constant either. What we need to do to find out its complexity is again to go about counting instructions. Clearly, if we pass some n to this function, it will execute itself n times. If you’re unsure about that, run it “by hand” now for n = 5 to validate that it actually works. For example, for n = 5, it will execute 5 times, as it will keep decreasing n by 1 in each call. We can see therefore that this function is then Θ( n ).

If you’re unsure about this fact, remember that you can always find the exact complexity by counting instructions. If you wish, you can now try to count the actual instructions performed by this function to find a function f( n ) and see that it’s indeed linear (recall that linear means Θ( n )).

See Figure 5 for a diagram to help you understand the recursions performed when factorial( 5 ) is called.

This should clear up why this function is of linear complexity.

Figure 6: The recursion performed by binary search. The A argument for each call is highlighted in black. The recursion continues until the array examined consists of only one element. Courtesy of Luke Francl.

Logarithmic complexity

One famous problem in computer science is that of searching for a value within an array. We solved this problem earlier for the general case. This problem becomes interesting if we have an array which is sorted and we want to find a given value within it. One method to do that is called binary search. We look at the middle element of our array: If we find it there, we’re done. Otherwise, if the value we find there is bigger than the value we’re looking for, we know that our element will be on the left part of the array. Otherwise, we know it’ll be on the right part of the array. We can keep cutting these smaller arrays in halves until we have a single element to look at. Here’s the method using pseudocode:

def binarySearch( A, n, value ):

if n = 1:

if A[ 0 ] = value:

return true

else:

return false

if value < A[ n / 2 ]:

return binarySearch( A[ 0…( n / 2 – 1 ) ], n / 2 – 1, value )

else if value > A[ n / 2 ]:

return binarySearch( A[ ( n / 2 + 1 )…n ], n / 2 – 1, value )

else:

return true

This pseudocode is a simplification of the actual implementation. In practice, this method is easier described than implemented, as the programmer needs to take care of some implementation issues. There are off-by-one errors and the division by 2 may not always produce an integer value and so it’s necessary to floor() or ceil() the value. But we can assume for our purposes that it will always succeed, and we’ll assume our actual implementation in fact takes care of the off-by-one errors, as we only want to analyze the complexity of this method. If you’ve never implemented binary search before, you may want to do this in your favourite programming language. It’s a truly enlightening endeavor.

See Figure 6 to help you understand the way binary search operates.

If you’re unsure that this method actually works, take a moment now to run it by hand in a simple example and convince yourself that it actually works.

Let us now attempt to analyze this algorithm. Again, we have a recursive algorithm in this case. Let’s assume, for simplicity, that the array is always cut in exactly a half, ignoring just now the + 1 and – 1 part in the recursive call. By now you should be convinced that a little change such as ignoring + 1 and – 1 won’t affect our complexity results. This is a fact that we would normally have to prove if we wanted to be prudent from a mathematical point of view, but practically it is intuitively obvious. Let’s assume that our array has a size that is an exact power of 2, for simplicity. Again this assumption doesn’t change the final results of our complexity that we will arrive at. The worst-case scenario for this problem would happen when the value we’re looking for does not occur in our array at all. In that case, we’d start with an array of size n in the first call of the recursion, then get an array of size n / 2 in the next call. Then we’ll get an array of size n / 4 in the next recursive call, followed by an array of size n / 8 and so forth. In general, our array is split in half in every call, until we reach 1. So, let’s write the number of elements in our array for every call:

  1. 0th iteration: n
  2. 1st iteration: n / 2
  3. 2nd iteration: n / 4
  4. 3rd iteration: n / 8
  5. ith iteration: n / 2i
  6. last iteration: 1

Notice that in the i-th iteration, our array has n / 2i elements. This is because in every iteration we’re cutting our array into half, meaning we’re dividing its number of elements by two. This translates to multiplying the denominator with a 2. If we do that i times, we get n / 2i. Now, this procedure continues and with every larger i we get a smaller number of elements until we reach the last iteration in which we have only 1 element left. If we wish to find i to see in what iteration this will take place, we have to solve the following equation:

1 = n / 2i

This will only be true when we have reached the final call to the binarySearch() function, not in the general case. So solving for i here will help us find in which iteration the recursion will finish. Multiplying both sides by 2i we get:

2i = n

Now, this equation should look familiar if you read the logarithms section above. Solving for i we have:

i = log( n )

This tells us that the number of iterations required to perform a binary search is log( n ) where n is the number of elements in the original array.

If you think about it, this makes some sense. For example, take n = 32, an array of 32 elements. How many times do we have to cut this in half to get only 1 element? We get: 32 → 16 → 8 → 4 → 2 → 1. We did this 5 times, which is the logarithm of 32. Therefore, the complexity of binary search is Θ( log( n ) ).

This last result allows us to compare binary search with linear search, our previous method. Clearly, as log( n ) is much smaller than n, it is reasonable to conclude that binary search is a much faster method to search within an array then linear search, so it may be advisable to keep our arrays sorted if we want to do many searches within them.

Rule of thumb: Improving the asymptotic running time of a program often tremendously increases its performance, much more than any smaller “technical” optimizations such as using a faster programming language.

Optimal sorting

Congratulations. You now know about analyzing the complexity of algorithms, asymptotic behavior of functions and big-O notation. You also know how to intuitively figure out that the complexity of an algorithm is O( 1 ), O( log( n ) ), O( n ), O( n2 ) and so forth. You know the symbols o, O, ω, Ω and Θ and what worst-case analysis means. If you’ve come this far, this tutorial has already served its purpose.

This final section is optional. It is a little more involved, so feel free to skip it if you feel overwhelmed by it. It will require you to focus and spend some moments working through the exercises. However, it will provide you with a very useful method in algorithm complexity analysis which can be very powerful, so it’s certainly worth understanding.

We looked at a sorting implementation above called a selection sort. We mentioned that selection sort is not optimal. An optimal algorithm is an algorithm that solves a problem in the best possible way, meaning there are no better algorithms for this. This means that all other algorithms for solving the problem have a worse or equal complexity to that optimal algorithm. There may be many optimal algorithms for a problem that all share the same complexity. The sorting problem can be solved optimally in various ways. We can use the same idea as with binary search to sort quickly. This sorting method is called mergesort.

To perform a mergesort, we will first need to build a helper function that we will then use to do the actual sorting. We will make a merge function which takes two arrays that are both already sorted and merges them together into a big sorted array. This is easily done:

def merge( A, B ):

if empty( A ):

return B

if empty( B ):

return A

if A[ 0 ] < B[ 0 ]:

return concat( A[ 0 ], merge( A[ 1…A_n ], B ) )

else:

return concat( B[ 0 ], merge( A, B[ 1…B_n ] ) )

The concat function takes an item, the “head”, and an array, the “tail”, and builds up and returns a new array which contains the given “head” item as the first thing in the new array and the given “tail” item as the rest of the elements in the array. For example, concat( 3, [ 4, 5, 6 ] ) returns [ 3, 4, 5, 6 ]. We use A_n and B_n to denote the sizes of arrays A and B respectively.

Analyzing this algorithm reveals that it has a running time of Θ( n ), where n is the length of the resulting array (n = A_n + B_n).

Utilizing this function we can build a better sorting algorithm. The idea is the following: We split the array into two parts. We sort each of the two parts recursively, then we merge the two sorted arrays into one big array. In pseudocode:

def mergeSort( A, n ):

if n = 1:

return A # it is already sorted

middle = floor( n / 2 )

leftHalf = A[ 1…middle ]

rightHalf = A[ ( middle + 1 )…n ]

return merge( mergeSort( leftHalf, middle ), mergeSort( rightHalf, n – middle ) )

This function is harder to understand than what we’ve gone through previously, so the following exercise may take you a few minutes.

As a final example, let us analyze the complexity of mergeSort. In every step of mergeSort, we’re splitting the array into two halves of equal size, similarly to binarySearch. However, in this case, we maintain both halves throughout execution. We then apply the algorithm recursively in each half. After the recursion returns, we apply the merge operation on the result which takes Θ( n ) time.

So, we split the original array into two arrays of size n / 2 each. Then we merge those arrays, an operation that merges n elements and thus takes Θ( n ) time.

Take a look at Figure 7 to understand this recursion.

Figure 7: The recursion tree of merge sort.

Let’s see what’s going on here. Each circle represents a call to the mergeSort function. The number written in the circle indicates the size of the array that is being sorted. The top blue circle is the original call to mergeSort, where we get to sort an array of size n. The arrows indicate recursive calls made between functions. The original call to mergeSort makes two calls to mergeSort on two arrays, each of size n / 2. This is indicated by the two arrows at the top. In turn, each of these calls makes two calls of its own to mergeSort two arrays of size n / 4 each, and so forth until we arrive at arrays of size 1. This diagram is called a recursion tree, because it illustrates how the recursion behaves and looks like a tree (the root is at the top and the leaves are at the bottom, so in reality it looks like an inversed tree).

Notice that at each row in the above diagram, the total number of elements is n. To see this, take a look at each row individually. The first row contains only one call to mergeSort with an array of size n, so the total number of elements is n. The second row has two calls to mergeSort each of size n / 2. But n / 2 + n / 2 = n and so again in this row the total number of elements is n. In the third row, we have 4 calls each of which is applied on an n / 4-sized array, yielding a total number of elements equal to n / 4 + n / 4 + n / 4 + n / 4 = 4n / 4 = n. So again we get n elements. Now notice that at each row in this diagram the caller will have to perform a merge operation on the elements returned by the callees. For example, the circle indicated with red color has to sort n / 2 elements. To do this, it splits the n / 2-sized array into two n / 4-sized arrays, calls mergeSort recursively to sort those (these calls are the circles indicated with green color), then merges them together. This merge operation requires to merge n / 2 elements. At each row in our tree, the total number of elements merged is n. In the row that we just explored, our function merges n / 2 elements and the function on its right (which is in blue color) also has to merge n / 2 elements of its own. That yields n elements in total that need to be merged for the row we’re looking at.

By this argument, the complexity for each row is Θ( n ). We know that the number of rows in this diagram, also called the depth of the recursion tree, will be log( n ). The reasoning for this is exactly the same as the one we used when analyzing the complexity of binary search. We have log( n ) rows and each of them is Θ( n ), therefore the complexity of mergeSort is Θ( n * log( n ) ). This is much better than Θ( n2 ) which is what selection sort gave us (remember that log( n ) is much smaller than n, and so n * log( n ) is much smaller than n * n = n2). If this sounds complicated to you, don’t worry: It’s not easy the first time you see it. Revisit this section and reread about the arguments here after you implement mergesort in your favourite programming language and validate that it works.

As you saw in this last example, complexity analysis allows us to compare algorithms to see which one is better. Under these circumstances, we can now be pretty certain that merge sort will outperform selection sort for large arrays. This conclusion would be hard to draw if we didn’t have the theoretical background of algorithm analysis that we developed. In practice, indeed sorting algorithms of running time Θ( n * log( n ) ) are used. For example, the Linux kernel uses a sorting algorithm called heapsort, which has the same running time as mergesort which we explored here, namely Θ( n log( n ) ) and so is optimal. Notice that we have not proven that these sorting algorithms are optimal. Doing this requires a slightly more involved mathematical argument, but rest assured that they can’t get any better from a complexity point of view.

Having finished reading this tutorial, the intuition you developed for algorithm complexity analysis should be able to help you design faster programs and focus your optimization efforts on the things that really matter instead of the minor things that don’t matter, letting you work more productively. In addition, the mathematical language and notation developed in this article such as big-O notation is helpful in communicating with other software engineers when you want to argue about the running time of algorithms, so hopefully you will be able to do that with your newly acquired knowledge.

References

  1. Cormen, Leiserson, Rivest, Stein. Introduction to Algorithms, MIT Press.
  2. Dasgupta, Papadimitriou, Vazirani. Algorithms, McGraw-Hill Press.
  3. Fotakis. Course of Discrete Mathematics at the National Technical University of Athens.
  4. Fotakis. Course of Algorithms and Complexity at the National Technical University of Athens.

Lovingly made in Athens city by dionyziz.

Dynamic Programming – Edit Distance

Dynamic programming is essentially recursion without repetition. Developing a dynamic programming algorithm generally involves two separate steps:

•  Formulate problem recursively. Write down a formula for the whole problem as a simple combination of answers to smaller subproblems.

•  Build solution to recurrence from bottom up. Write an algorithm that starts with base cases and works its way up to the final solution.

Dynamic programming algorithms need to store the results of intermediate subproblems. This is often but not always done with some kind of table. We will now cover a number of examples of problems in which the solution is based on dynamic programming strategy.
Edit Distance

The words “computer” and “commuter” are very similar, and a change of just one letter, p- m, will change the first word into the second. The word “sport” can be changed into “sort” by the deletion of the ‘p’, or equivalently, ‘sort’ can be changed into ‘sport’ by the insertion of ‘p’. The edit distance of two strings, s1 and s2, is defined as the minimum number of point mutations required to change s1 into s2, where a point mutation is one of:

•  change a letter,

•  insert a letter or

•  delete a letter

For example, the edit distance between FOOD and MONEY is at most four:

FOOD −→ MOOD −→ MON –D

−→ MONED −→ MONEY

Edit Distance: Applications

There are numerous applications of the Edit Distance algorithm. Here are some examples:

Spelling Correction

If a text contains a word that is not in the dictionary, a ‘close’ word, i.e. one with a small edit distance, may be suggested as a correction. Most word processing applications, such as Microsoft Word, have spelling checking and correction facility. When Word, for example, finds an incorrectly spelled word, it makes suggestions of possible replacements.

Plagiarism Detection

If someone copies, say, a C program and makes a few changes here and there, for example, change variable names, add a comment of two, the edit distance between the source and copy may be small. The edit distance provides an indication of similarity that might be too close in some situations.

Computational Molecular Biology DNA is a polymer. The monomer units of DNA are nucleotides, and the polymer is known as a “polynucleotide.” Each nucleotide consists of a 5-carbon sugar (deoxyribose), a nitrogen containing base attached to the sugar, and a phosphate group. There are four different types of nucleotides found in DNA, differing only in the nitrogenous base. The four nucleotides are given one letter abbreviations as shorthand for the four bases.

•  A-adenine

•  G-guanine

•  C-cytosine

•  T-thymine

Double-helix of DNA molecule with nucleotides Figure of Double-helix of DNA molecule with nucleotides goes here. The edit distance like algorithms are used to compute a distance between DNA sequences (strings over A,C,G,T, or protein sequences (over an alphabet of 20 amino acids), for various purposes, e.g.:

•  to find genes or proteins that may have shared functions or properties

•  to infer family relationships and evolutionary trees over different organisms.

Speech Recognition

Algorithms similar to those for the edit-distance problem are used in some speech recognition systems. Find a close match between a new utterance and one in a library of classified utterances.

Edit Distance Algorithm

A better way to display this editing process is to place the words above the other:

S

D

I

M D M
MA A R TT H

S

S

The first word has a gap for every insertion (I) and the second word has a gap for every deletion (D). Columns with two different characters correspond to substitutions (S). Matches (M) do not count. The Edit transcript is defined as a string over the alphabet M, S, I, D that describes a transformation of one string into another. For example

 S      D      I      M     D     M

1+    1+    1+    0+    1+    0+    = 4

In general, it is not easy to determine the optimal edit distance. For example, the distance between ALGORITHM and ALTRUISTIC  is at most 6.

 A    L    G    O    R           I          T     H    M

A         L             T     R    U    I      S   T           I      C

Is this optimal?

Edit Distance: Dynamic Programming Algorithm

Suppose we have an m-character string A and an n-character string B. Define E(i, j) to be the edit distance between the first i characters of A and the first j characters of B. For example,

A L _G      O     R    I    T     H     M

A     L            T     R    U     I    S      T     I    C

The edit distance between entire strings A and B is E(m, n). The gap representation for the edit sequences has a crucial “optimal substructure” property. If we remove the last column, the remaining columns must represent the shortest edit sequence for the remaining substrings. The edit distance is 6 for the following two words.

A    L    G    O      R           I             T     H   M

A    L             T     R    U    I      S   T           I      C

If we remove the last column, the edit distance reduces to 5.

A    L    G    O    R           I          T    H

A       L           T     R    U    I    S    T    I

We can use the optimal substructure property to devise a recursive formulation of the edit distance problem. There are a couple of obvious base cases:

•  The only way to convert an empty string into a string of j characters is by doing j insertions. Thus

E(0, j) = j

•  The only way to convert a string of i characters into the empty string is with i deletions:

E(i, 0) = i

There are four possibilities for the last column in the shortest possible edit sequence:

Deletion: Last entry in bottom row is empty.

i=3

A  _L      G    O     R   I    T     H     M

A    Lj=2 _T     R    U     I    S     T     I    C

In this case

E(i, j) = E(i − 1, j) + 1

Insertion: The last entry in the top row is empty.

i=5

A L    G_ O     R         I    T     H     M

A    L            T     R    U   I    S      T     I    C

j=5

In this case

E(i, j) = E(i, j − 1) + 1

Substitution: Both rows have characters in the last column.

i=4

A L _ G     O      R   I    T     H     M

A    L            T  R   U     I    S     T     I    C

j=3

If the characters are different, then

E(i, j) = E(i − 1, j − 1) + 1

A L _G      O      R   I    T     H     M

i=5

A    L            T     R  U     I    S      T     I    C

j=4
If characters are same, no substitution is needed:
E(i, j) = E(i − 1, j − 1)

Thus the edit distance E(i, j) is the smallest of the four possibilities:

E(i, j) = min

E(i − 1, j) + 1

E(i, j − 1) + 1        E(i − 1, j − 1) + 1  if A[i] = B[j] E(i − 1, j − 1)             if A[i] = B[j]

Consider the example of edit between the words “ARTS” and “MATHS”:

A    R    T    S

M    A    T    H    S

The edit distance would be in E(4, 5). If we recursion to compute, we will have

E(3, 5) + 1

E(4, 5) = min   E(4, 4) + 1

E(3, 4) + 1  if A[4] = B[5]

            E(3, 4)   if A[4] = B[5]

Recursion clearly leads to the same repetitive call pattern that we saw in Fibonnaci sequence. To avoid this, we will use the DP approach. We will build the solution bottom-up. We will use the base case E(0, j) to fill first row and the base case E(i, 0) to fill first column. We will fill the remaining E matrix row by row.

A

R

T

S

M
A
T
H

S

A

R

T

S

3 4→
M ↓1
A ↓2

T

↓3
H ↓4

S

↓5

First row and first column entries using the base cases

We can now fill the second row. The table not only shows the values of the cells E[i, j] but also arrows

that indicate how it was computed using values in E[i − 1, j], E[i, j − 1] and E[i − 1, j − 1]. Thus, if a cell E[i, j] has a down arrow from E[i − 1, j] then the minimum was found using E[i − 1, j]. For a right arrow, the minimum was found using E[i, j − 1]. For a diagonal down right arrow, the minimum was found

using E[i − 1, j − 1]. There are certain cells that have two arrows pointed to it. In such a case, the minimum could be obtained from the diagonal E[i − 1, j − 1] and either of E[i − 1, j] and E[i, j − 1]. We will use these arrows later to determine the edit script.

A

R

T

S

M ↓1 1
A ↓2
T ↓3
H ↓4

S

↓5

A

R

T

S

3 4→
M ↓1 1 2→
A ↓2

T

↓3
H ↓4

S

↓5

Computing E[1, 1] and E[1, 2]

A

R

T

S

M
A ↓2
T ↓3
H ↓4

S

↓5

A

R

T

S

3 4→
M 3 4→
A ↓2

T

↓3
H ↓4

S

↓5

Computing E[1, 3] and E[1, 4]

An edit script can be extracted by following a unique path from E[0, 0] to E[4, 5]. There are three possible paths in the current example. Let us follow these paths and compute the edit script. In an actual implementation of the dynamic programming version of the edit distance algorithm, the arrows would be recorded using an appropriate data structure. For example, each cell in the matrix could be a record with fields for the value (numeric) and flags for the three incoming arrows.

A

R

T

S

1 →2 →3 4→
M ↓1 1 →2 →3 4→
A ↓2 1 →2 →3 4→

T

↓3

2

2 2 3→
H ↓4

3

↓3 ↓3 3

S

↓5

4

↓4 ↓4 3

The final table with all E[i, j] entries computed

Solution path 1:

1+    0+     1+     1+       0     = 3

A

R

T

S

1 →2 →3 4→
M ↓1 1 →2 →3 4→
A ↓2 1 →2 →3 4→

T

↓3

2

2 2 3→
H ↓4

3

↓3 ↓3 3

S

↓5

4

↓4 ↓4 3

Possible edit scripts. The red arrows from E[0, 0] to E[4, 5] show the paths that can be followed to extract edit scripts.

Solution path 2:

1+    1+    0+    1+     0     = 3

Solution path 3:

1+    0+    1+    0+    1+     0     = 3

Analysis of DP Edit Distance

There are Θ(n2) entries in the matrix. Each entry E(i, j) takes Θ(1) time to compute. The total running time is Θ(n2).

Linear Time Sorting – Bucket or Bin Sort

Assume that the keys of the items that we wish to sort lie in a small fixed range and that there is only one item with each value of the key. Then we can sort with the following procedure:

1. Set up an array of “bins” – one for each value of the key – in order,

2. Examine each item and use the value of the key to place it in the appropriate bin.

Now our collection is sorted and it only took n operations, so this is an O(n) operation. However, note that it will only work under very restricted conditions. To understand these restrictions, let’s be a little more precise about the specification of the problem and assume that there are m values of the key. To recover our sorted collection, we need to examine each bin. This adds a third step to the algorithm above,

3. Examine each bin to see whether there’s an item in it.

which requires m operations. So the algorithm’s time becomes:

 T (n) = c1n + c2m

and it is strictly O(n + m). If m ≤ n, this is clearly O(n).  However if m >> n, then it is O(m).  An implementation of bin sort might look like:

BUCEKTSORT( array  A, int  n, int  M)

1    // Pre-condition: for 1 ≤ i ≤ n, 0 ≤ a[i] < M

2    // Mark all the bins empty

3    for i ← 1 to  M

4    do bin[i] ← Empty

5    for i ← 1 to  n

6    do bin[A[i]] ← A[i]

If there are duplicates, then each bin can be replaced by a linked list. The third step then becomes:

3. Link all the lists into one list.

We can add an item to a linked list in O(1) time. There are n items requiring O(n) time. Linking a list to another list simply involves making the tail of one list point to the other, so it is O(1). Linking m such lists obviously takes O(m) time, so the algorithm is still O(n + m).

Linear Time Sorting, Counting Sort & Radix Sort

The lower bound of sorting algorithms implies that if we hope to sort numbers faster than O(n log n), we cannot do it by making comparisons alone. Is it possible to sort without making comparisons?

The answer is yes, but only under very restrictive circumstances. Many applications involve sorting small integers (e.g. sorting characters, exam scores, etc.). We present three algorithms based on the theme of speeding up sorting in special cases, by not making comparisons.

Counting sort assumes that the numbers to be sorted are in the range 1 to k where k is small. The basic idea is to determine the rank of each number in final sorted array.

The rank of an item is the number of elements that are less than or equal to it. Once we know the ranks, we simply copy numbers to their final position in an output array.

The question is how to find the rank of an element without comparing it to the other elements of the array?. The algorithm uses three arrays.

A[1..n] holds the initial input, B[1..n] holds the sorted output and C[1..k] is an array of integers.

C[x] is the rank of x in A, where x ∈ [1..k].

The algorithm is remarkably simple, but deceptively clever. The algorithm operates by first constructing C. This is done in two steps.

First we set C[x] to be the number of elements of A[j] that are equal to x. We can do this initializing C to zero, and then for each j, from 1 to n, we increment C[A[j]] by 1.

Thus, if A[j] = 5, then the 5th element of C is incremented, indicating that we have seen one more 5. To determine the number of elements that are less than or equal to x, we replace C[x] with the sum of elements in the sub array R[1 : x]. This is done by just keeping a running total of the elements of C.

C[x] now contains the rank of x. This means that if x = A[j] then the final position of A[j] should be at position C[x] in the final sorted array.

Thus, we set B[C[x]] = A[j]. Notice We need to be careful if there are duplicates, since we do not want them to overwrite the same location of B. To do this, we decrement C[i] after copying.

COUNTING-SORT( array  A, array  B, int  k)

1    for i ← 1 to  k

2    do C[i] ← 0           k times

3    for j ← 1 to  length[A]

4    do C[A[j]] ← C[A[j]] + 1             n times

5    // C[i] now contains the number of elements = i

6    for i ← 2 to  k

7    do C[i] ← C[i] + C[i − 1]            k times

8    // C[i] now contains the number of elements ≤ i

9    for j ← length[A] downto 1

10    do B[C[A[j]]] ← A[j]

11         C[A[j]] ← C[A[j]] − 1             n times

There are four (un-nested) loops, executed k times, n times, k − 1 times, and n times, respectively, so the total running time is Θ(n + k) time. If k = O(n), then the total running time is Θ(n).

Counting sort is not an in-place sorting algorithm but it is stable. Stability is important because data are often carried with the keys being sorted. radix sort (which uses counting sort as a subroutine) relies on it to work correctly. Stability achieved by running the loop down from n to 1 and not the other way around.

Radix Sort

The main shortcoming of counting sort is that it is useful for small integers, i.e., 1..k where k is small. If k were a million or more, the size of the rank array would also be a million. Radix sort provides a nice work around this limitation by sorting numbers one digit at a time.

576            49[4]            9[5]4            [1]76            176

494            19[4]            5[7]6            [1]94            194

194            95[4]            1[7]6            [2]78            278

296    ⇒     57[6]    ⇒     2[7]8    ⇒     [2]96    ⇒     296

278            29[6]            4[9]4            [4]94            494

176            17[6]            1[9]4            [5]76            576

954            27[8]            2[9]6            [9]54            954

Here is the algorithm that sorts A[1..n] where each number is d digits long.

RADIX-SORT( array  A, int  n, int  d)

1    for i ← 1 to  d

2    do stably sort A w.r.t i th lowest order digit