Blog Archives

Sorting Localized Strings

Sorting becomes more complicated when you’re working in different local language strings, and the standard library solves this problem. The facet collate provides a member function compare that works like strcmp: it returns -1 if the first string is less than the second, 0 if they are equal, and 1 if the first string is greater than the second. Unlike strcmp, collate::compare uses the character semantics of the target locale.
Below code presents the function localeLessThan, which returns true if the first argument is less than the second according to the global locale. The most important part of the function is the call to compare:
col.compare(pb1, // Pointer to the first char
pb1 + s1.size( ), // Pointer to one past the last char
pb2,
pb2 + s2.size( ))
Depending on the execution character set of your implementation, below code may return the results I showed earlier or not. But if you want to ensure string comparison works in a locale-specific manner, you should use collate::compare. Of course, the standard does not require an implementation to support any locales other than “C,” so be sure to test for all the locales you support.

The locale class has built-in support for comparing characters in a given locale by overriding operator. You can use an instance of the locale class as your comparison functor when you call any standard function that takes a functor for comparison.

#include <iostream>
#include <locale>
#include <string>
#include <vector>
#include <algorithm>

using namespace std;

bool localeLessThan (const string& s1, const string& s2)
{
const collate<char>& col = use_facet<collate<char> >(locale( )); // Use the global locale
const char* pb1 = s1.data( );
const char* pb2 = s2.data( );
return (col.compare(pb1, pb1 + s1.size( ), pb2, pb2 + s2.size( )) < 0);
}

int main( )
{
// Create two strings, one with a German character
string s1 = "diät";
string s2 = "dich";
vector<string> v;
v.push_back(s1);
v.push_back(s2);

// Sort without giving a locale, which will sort according to the current global locale's rules.
sort(v.begin( ), v.end( ));
for (vector<string>::const_iterator p = v.begin( ); p != v.end( ); ++p)
cout << *p << endl;

// Set the global locale to German, and then sort
locale::global(locale("german"));
sort(v.begin( ), v.end( ), localeLessThan);

for (vector<string>::const_iterator p = v.begin( );
p != v.end( ); ++p)
cout << *p << endl;
}

The first sort follows ASCII sorting convention, and therefore the output looks like
this:
dich
diät

The second sort uses the proper ordering according to German semantics, and it is just the opposite:
diät
dich

Implementing Fixed-Point Numbers in C++

A fixed-point number, like a floating-point number, is an approximate representation of a real number. A floating-point number is stored as a mantissa (m), and an exponent (e), to form the equation m * be, where b is some constant. A fixed-point number is almost the same but the exponent is also a constant. This constant is passed to the basic_fixed_real in below class template as a template parameter.
By representing e as a constant, it allows fixed-point numbers to be represented internally as integers and for the arithmetic operations on them to be performed using integer arithmetic. This can often improve the speed of basic arithmetic operations
especially addition and subtraction. Fixed-point representations are less flexible than floating-point numbers, as they can only represent a narrow range of values. The fixed_real type in below function has a range that can only represent values from –2,097,151 to +2,097,151 with a precision of 1/1,024.
Implementing addition and subtraction of fixed-point numbers is straightforward enough: I simply add or subtract the underlying representation. To perform division and multiplication, I need an extra step of shifting the mantissa left or right to adjust for the binary point.

Below class template provides the implementation of a fixed-point real number, where the number of places to the right of the binary point is a template parameter. For instance basic_fixed_real<10> has 10 binary digits to the right of the binary point, allowing it to represent numbers up to a precision of 1/1,024.


#include <iostream>
using namespace std;

template<int E>
struct BasicFixedReal
{
typedef BasicFixedReal self;
static const int factor = 1 << (E - 1);
BasicFixedReal( ) : m(0) { }
BasicFixedReal(double d) : m(static_cast(d * factor)) { }
self& operator+=(const self& x) { m += x.m; return *this; }
self& operator-=(const self& x) { m -= x.m; return *this; }
self& operator*=(const self& x) { m *= x.m; m >>= E; return *this; }
self& operator/=(const self& x) { m /= x.m; m *= factor; return *this; }
self& operator*=(int x) { m *= x; return *this; }
self& operator/=(int x) { m /= x; return *this; }
self operator-( ) { return self(-m); }
double toDouble( ) const { return double(m) / factor; }
// friend functions
friend self operator+(self x, const self& y) { return x += y; }
friend self operator-(self x, const self& y) { return x -= y; }
friend self operator*(self x, const self& y) { return x *= y; }
friend self operator/(self x, const self& y) { return x /= y; }
// comparison operators
friend bool operator==(const self& x, const self& y) { return x.m == y.m; }
friend bool operator!=(const self& x, const self& y) { return x.m != y.m; }
friend bool operator>(const self& x, const self& y) { return x.m > y.m; }
friend bool operator<(const self& x, const self& y) { return x.m < y.m; }
friend bool operator>=(const self& x, const self& y) { return x.m >= y.m; }
friend bool operator<=(const self& x, const self& y) { return x.m <= y.m; }
private:
int m;
};
typedef BasicFixedReal<10> FixedReal;
int main( ) {
FixedReal x(0);
for (int i=0; i < 100; ++i) {

x += FixedReal(0.0625);
}
cout << x.toDouble( ) << endl;

}

The program outputs:
6.25

 

Representing Large Fixed-Width Integers in C++

Large integers are common in many applications. In cryptography, for example, integers of 1,000 bits and larger are not uncommon. The implementation I presented here is based on a binary representation of numbers using a bitset, at a cost of some performance. What I lost in performance I more than made up for in simplicity. A more efficient implementation of arbitrary precision numbers could easily fill the book.

The BigInt template below uses the bitset from the header to (from my last post) allow you to represent unsigned integers using a fixed number of bits specified as a template parameter.


//big_int.hpp
#ifndef BIG_INT_HPP
#define BIG_INT_HPP
#include <bitset>
#include "bitset_arithmetic.hpp" // from my last post of bitset manipulation.
template<unsigned int N>
class BigInt

{
typedef BigInt self;
public:
BigInt( ) : bits( ) { }
BigInt(const self& x) : bits(x.bits) { }
BigInt(unsigned long x) {
int n = 0;
while (x) {
bits[n++] = x & 0x1;
x >>= 1;
}
}
explicit BigInt(const std::bitset& x) : bits(x) { }
// public functions
bool operator[](int n) const { return bits[n]; }
unsigned long toUlong( ) const { return bits.to_ulong( ); }
// operators
self& operator<<=(unsigned int n) {
bits <<= n;
return *this;
}
self& operator>>=(unsigned int n) {
bits >>= n;
return *this;
}
self operator++(int) {
self i = *this;
operator++( );
return i;
}
self operator--(int) {
self i = *this;
operator--( );
return i;
}
self& operator++( ) {
bool carry = false;
bits[0] = fullAdder(bits[0], 1, carry);
for (int i = 1; i < N; i++) {
bits[i] = fullAdder(bits[i], 0, carry);
}
return *this;
}
self& operator--( ) {
bool borrow = false;
bits[0] = fullSubtractor(bits[0], 1, borrow);
for (int i = 1; i < N; i++) {
bits[i] = fullSubtractor(bits[i], 0, borrow);
}
return *this;
}
self& operator+=(const self& x) {
bitsetAdd(bits, x.bits);
return *this;
}
self& operator-=(const self& x) {
bitsetSubtract(bits, x.bits);
return *this;
}
self& operator*=(const self& x) {
bitsetMultiply(bits, x.bits);
return *this;
}
self& operator/=(const self& x) {
std::bitset tmp;
bitsetDivide(bits, x.bits, bits, tmp);
return *this;
}
self& operator%=(const self& x) {
std::bitset tmp;
bitsetDivide(bits, x.bits, tmp, bits);
return *this;
}
self operator~( ) const { return ~bits; }
self& operator&=(self x) { bits &= x.bits; return *this; }
self& operator|=(self x) { bits |= x.bits; return *this; }
self& operator^=(self x) { bits ^= x.bits; return *this; }
// friend functions
friend self operator<<(self x, unsigned int n) { return x <<= n; }
friend self operator>>(self x, unsigned int n) { return x >>= n; }
friend self operator+(self x, const self& y) { return x += y; }
friend self operator-(self x, const self& y) { return x -= y; }
friend self operator*(self x, const self& y) { return x *= y; }
friend self operator/(self x, const self& y) { return x /= y; }
friend self operator%(self x, const self& y) { return x %= y; }
friend self operator^(self x, const self& y) { return x ^= y; }
friend self operator&(self x, const self& y) { return x &= y; }
friend self operator|(self x, const self& y) { return x |= y; }
// comparison operators
friend bool operator==(const self& x, const self& y) {
return x.bits == y.bits;
}
friend bool operator!=(const self& x, const self& y) {
return x.bits != y.bits;
}
friend bool operator>(const self& x, const self& y) {
return bitsetGt(x.bits, y.bits);
}
friend bool operator<(const self& x, const self& y) {
return bitsetLt(x.bits, y.bits);
}
friend bool operator>=(const self& x, const self& y) {
return bitsetGtEq(x.bits, y.bits);
}
friend bool operator<=(const self& x, const self& y) {
return bitsetLtEq(x.bits, y.bits);
}
private:
std::bitset<N> bits;
};

The BigInt template class could be used to represent factorials, as shown below,


//Using the big_int class
#include "big_int.hpp"
#include <iostream>
#include <vector>
#include <iterator>
#include <algorithm>
using namespace std;
void outputBigInt(BigInt<1024> x) {
vector<int> v;
if (x == 0) {
cout << 0;
return;
}
while (x > 0) {
v.push_back((x % 10).to_ulong( ));
x /= 10;
}
copy(v.rbegin( ), v.rend( ), ostream_iterator(cout, ""));
cout << endl;
}
int main( ) {
BigInt<1024> n(1);
// compute 32 factorial
for (int i=1; i <= 32; ++i) {
n *= i;
}
outputBigInt(n);
}

The program output is:
263130836933693530167218012160000000

 

Performing Arithmetic on Bitsets in C++

The bitset class template comes with basic operations for manipulating sets of bits but doesn’t provide any arithmetic or comparison operations. This is because the library can’t safely assume what kind of numerical type a programmer might expect an arbitrary set of bits to represent.The functions below treat a bitset as a representation of an unsigned integer type, and provide you with functions for adding, subtracting, multiplying, dividing, and comparing them. These functions can provide a basis for writing custom-sized integer types. I did not use the most efficient algorithms I could for below function. Instead I chose the simplest possible algorithms because they are more easily understood. A much more efficient implementation would use similar algorithms, but would operate on words rather than single bits.

The functions below provide functions that allow arithmetic and comparison of bitset class template from the header as if it represents an unsigned integer.


#include <stdexcept>
#include <bitset>
bool fullAdder(bool b1, bool b2, bool& carry)

{
bool sum = (b1 ^ b2) ^ carry;
carry = (b1 && b2) || (b1 && carry) || (b2 && carry);
return sum;
}
bool fullSubtractor(bool b1, bool b2, bool& borrow)

{
bool diff;
if (borrow)

{
diff = !(b1 ^ b2);
borrow = !b1 || (b1 && b2);
}
else

{
diff = b1 ^ b2;
borrow = !b1 && b2;
}
return diff;
}
template<unsigned int N>
bool bitsetLtEq(const std::bitset& x, const std::bitset& y)
{
for (int i=N-1; i >= 0; i--)

{
if (x[i] && !y[i]) return false;
if (!x[i] && y[i]) return true;
}
return true;
}

template<unsigned int N>
bool bitsetLt(const std::bitset& x, const std::bitset& y)
{
for (int i=N-1; i >= 0; i--) {
if (x[i] && !y[i]) return false;
if (!x[i] && y[i]) return true;
}
return false;
}
template<unsigned int N>
bool bitsetGtEq(const std::bitset& x, const std::bitset& y)
{
for (int i=N-1; i >= 0; i--) {
if (x[i] && !y[i]) return true;
if (!x[i] && y[i]) return false;
}
return true;
}
template<unsigned int N>
bool bitsetGt(const std::bitset& x, const std::bitset& y)
{
for (int i=N-1; i >= 0; i--)

{
if (x[i] && !y[i]) return true;
if (!x[i] && y[i]) return false;
}
return false;
}
template<unsigned int N>
void bitsetAdd(std::bitset& x, const std::bitset& y)
{
bool carry = false;
for (int i = 0; i < N; i++)

{
x[i] = fullAdder(x[i], y[i], carry);
}
}
template<unsigned int N>
void bitsetSubtract(std::bitset& x, const std::bitset& y)

{
bool borrow = false;
for (int i = 0; i < N; i++)

{
if (borrow)

{
if (x[i])

{
x[i] = y[i];
borrow = y[i];
}
else

{
x[i] = !y[i];
borrow = true;
}

}
else

{
if (x[i])

{
x[i] = !y[i];
borrow = false;
}
else

{
x[i] = y[i];
borrow = y[i];
}
}
}
}
template<unsigned int N>
void bitsetMultiply(std::bitset& x, const std::bitset& y)
{
std::bitset tmp = x;
x.reset( );
// we want to minimize the number of times we shift and add
if (tmp.count( ) < y.count( ))

{
for (int i=0; i < N; i++)
if (tmp[i]) bitsetAdd(x, y << i);
}
else

{
for (int i=0; i < N; i++)
if (y[i]) bitsetAdd(x, tmp << i);
}
}
template<unsigned int N>
void bitsetDivide(std::bitset x, std::bitset y,
std::bitset<N>& q, std::bitset<N>& r)
{
if (y.none( ))

{
throw std::domain_error("division by zero undefined");
}
q.reset( );
r.reset( );
if (x.none( ))

{
return;
}
if (x == y)

{
q[0] = 1;
return;
}
r = x;
if (bitsetLt(x, y))

{
return;
}

// count significant digits in divisor and dividend
unsigned int sig_x;
for (int i=N-1; i>=0; i--)

{
sig_x = i;
if (x[i]) break;
}
unsigned int sig_y;
for (int i=N-1; i>=0; i--)

{
sig_y = i;
if (y[i]) break;
}
// align the divisor with the dividend
unsigned int n = (sig_x - sig_y);
y <<= n;
// make sure the loop executes the right number of times
n += 1;
// long division algorithm, shift, and subtract
while (n--)
{
// shift the quotient to the left
if (bitsetLtEq(y, r))
{
// add a new digit to quotient
q[n] = true;
bitsetSubtract(r, y);
}
// shift the divisor to the right
y >>= 1;
}
}

Using the bitset_arithmetic.hpp functions:

#include "bitset_arithmetic.hpp"
#include <bitset>
#include <iostream>
#include <string>
using namespace std;
int main( )

{
bitset<10> bits1(string("100010001"));
bitset<10> bits2(string("000000011"));
bitsetAdd(bits1, bits2);
cout << bits1.to_string<char, char_traits<char="">, allocator >( ) << endl;
}

The program produces the following output:
0100010100