Expecting UTF8

Expecting UTF8
Prev	Chapter 20. Internationalization and Localization	Next

A properly internationalized application will not make assumptions about the number of bytes in a character. That means that you shouldn't use pointer arithmetic to step through the characters in a string, and it means you shouldn't use std::string or standard C functions such as strlen() because they make the same assumption.

However, you probably already avoid bare char* arrays and pointer arithmetic by using std::string, so you just need to start using Glib::ustring instead. See the Basics chapter about Glib::ustring.

Glib::ustring and std::iostreams

TODO: This section is not clear - it needs to spell things out more clearly and obviously.

Unfortunately, the integration with the standard iostreams is not completely foolproof. gtkmm converts Glib::ustrings to a locale-specific encoding (which usually is not UTF-8) if you output them to an ostream with operator<<. Likewise, retrieving Glib::ustrings from istream with operator>> causes a conversion in the opposite direction. But this scheme breaks down if you go through a std::string, e.g. by inputting text from a stream to a std::string and then implicitly converting it to a Glib::ustring. If the string contained non-ASCII characters and the current locale is not UTF-8 encoded, the result is a corrupted Glib::ustring. You can work around this with a manual conversion. For instance, to retrieve the std::string from a ostringstream:

std::ostringstream output;
output.imbue(std::locale("")); // use the user's locale for this stream
output << percentage << " % done";
label->set_text(Glib::locale_to_utf8(output.str()));