Sorting Localized Strings

Sorting becomes more complicated when you’re working in different local language strings, and the standard library solves this problem. The facet collate provides a member function compare that works like strcmp: it returns -1 if the first string is less than the second, 0 if they are equal, and 1 if the first string is greater than the second. Unlike strcmp, collate::compare uses the character semantics of the target locale.
Below code presents the function localeLessThan, which returns true if the first argument is less than the second according to the global locale. The most important part of the function is the call to compare:
col.compare(pb1, // Pointer to the first char
pb1 + s1.size( ), // Pointer to one past the last char
pb2,
pb2 + s2.size( ))
Depending on the execution character set of your implementation, below code may return the results I showed earlier or not. But if you want to ensure string comparison works in a locale-specific manner, you should use collate::compare. Of course, the standard does not require an implementation to support any locales other than “C,” so be sure to test for all the locales you support.

The locale class has built-in support for comparing characters in a given locale by overriding operator. You can use an instance of the locale class as your comparison functor when you call any standard function that takes a functor for comparison.

#include <iostream>
#include <locale>
#include <string>
#include <vector>
#include <algorithm>

using namespace std;

bool localeLessThan (const string& s1, const string& s2)
{
const collate<char>& col = use_facet<collate<char> >(locale( )); // Use the global locale
const char* pb1 = s1.data( );
const char* pb2 = s2.data( );
return (col.compare(pb1, pb1 + s1.size( ), pb2, pb2 + s2.size( )) < 0);
}

int main( )
{
// Create two strings, one with a German character
string s1 = "diät";
string s2 = "dich";
vector<string> v;
v.push_back(s1);
v.push_back(s2);

// Sort without giving a locale, which will sort according to the current global locale's rules.
sort(v.begin( ), v.end( ));
for (vector<string>::const_iterator p = v.begin( ); p != v.end( ); ++p)
cout << *p << endl;

// Set the global locale to German, and then sort
locale::global(locale("german"));
sort(v.begin( ), v.end( ), localeLessThan);

for (vector<string>::const_iterator p = v.begin( );
p != v.end( ); ++p)
cout << *p << endl;
}

The first sort follows ASCII sorting convention, and therefore the output looks like
this:
dich
diät

The second sort uses the proper ordering according to German semantics, and it is just the opposite:
diät
dich

Posted on May 23, 2013, in C++ and tagged , , , , , , , , , , . Bookmark the permalink. Leave a comment.

Leave a comment