Sorting Localized Strings
Sorting becomes more complicated when you’re working in different local language strings, and the standard library solves this problem. The facet collate provides a member function compare that works like strcmp: it returns -1 if the first string is less than the second, 0 if they are equal, and 1 if the first string is greater than the second. Unlike strcmp, collate::compare uses the character semantics of the target locale.
Below code presents the function localeLessThan, which returns true if the first argument is less than the second according to the global locale. The most important part of the function is the call to compare:
col.compare(pb1, // Pointer to the first char
pb1 + s1.size( ), // Pointer to one past the last char
pb2,
pb2 + s2.size( ))
Depending on the execution character set of your implementation, below code may return the results I showed earlier or not. But if you want to ensure string comparison works in a locale-specific manner, you should use collate::compare. Of course, the standard does not require an implementation to support any locales other than “C,” so be sure to test for all the locales you support.
The locale class has built-in support for comparing characters in a given locale by overriding operator. You can use an instance of the locale class as your comparison functor when you call any standard function that takes a functor for comparison.
#include <iostream> #include <locale> #include <string> #include <vector> #include <algorithm> using namespace std; bool localeLessThan (const string& s1, const string& s2) { const collate<char>& col = use_facet<collate<char> >(locale( )); // Use the global locale const char* pb1 = s1.data( ); const char* pb2 = s2.data( ); return (col.compare(pb1, pb1 + s1.size( ), pb2, pb2 + s2.size( )) < 0); } int main( ) { // Create two strings, one with a German character string s1 = "diät"; string s2 = "dich"; vector<string> v; v.push_back(s1); v.push_back(s2); // Sort without giving a locale, which will sort according to the current global locale's rules. sort(v.begin( ), v.end( )); for (vector<string>::const_iterator p = v.begin( ); p != v.end( ); ++p) cout << *p << endl; // Set the global locale to German, and then sort locale::global(locale("german")); sort(v.begin( ), v.end( ), localeLessThan); for (vector<string>::const_iterator p = v.begin( ); p != v.end( ); ++p) cout << *p << endl; }
The first sort follows ASCII sorting convention, and therefore the output looks like
this:
dich
diät
The second sort uses the proper ordering according to German semantics, and it is just the opposite:
diät
dich
Related articles
- Performing Arithmetic on Bitsets in C++ (alikhuram.wordpress.com)
- Binary Search Tree template Class (BST) (alikhuram.wordpress.com)
- Representing Large Fixed-Width Integers in C++ (alikhuram.wordpress.com)
- Implementing Fixed-Point Numbers in C++ (alikhuram.wordpress.com)
- Template Compare iterator (stackoverflow.com)
- Go 1.1 performance improvements (dave.cheney.net)
- Updates and responses to reader feedback (rachelbythebay.com)
- GotW #3 Solution: Using the Standard Library (or, Temporaries Revisited) (herbsutter.com)
Posted on May 23, 2013, in C++ and tagged C++, Comparison of programming languages (string functions), Constant (programming), Method (computer programming), Pointer, Relational operator, Sorting different local language strings, Sorting Localized Strings, string, string comparison, Vector. Bookmark the permalink. Leave a comment.
Leave a comment
Comments 0