RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Prepare Yourself for the Unicode Revolution-3 : Page 3

Character Literals
You're already familiar with char and wchar_t literals such as the following:

char c1='a';
wchar_t wc1=L'a';
Literals of type _Char16_t look like this:

_Char16_t uc1=u'a'; 
Here, the literal u'a' represents a constant integral value whose type is _Char16_t. The size of the literal constant equals sizeof(_Char16_t).

A literal of type _Char32_t looks like this:

_Char32_t uc2=U'a';
As you can see, u and U have different meanings in this context.

String Literals
To define _Char16_t and _char32_t string literals, use the following prefixes:

const _Char16_t utf16msg[]= u"hello";
const _Char32_t utf32msg[]= U"hello";
The type of the literal string u"hello" is array of n const _Char16_t and has static storage duration, where n is the size of the string as defined as follows: the total number of escape sequences, universal-character-names, and other characters, plus one for the terminating u'\0'.

Universal Character Names
Universal character names in the form \unnnn and \Unnnnnnnn contain hexadecimal values of Unicode symbols defined in Annex C of ISO 10646-1. For example, '\u0531' is the universal character name of the first letter in the Armenian codepage. At present, the support of universal character names is implementation-defined—some C++ compilers accept them, while others don't.

A new standard header called <cuchar> will be added to C++, which includes the definitions of the typedefs char16_t and char32_t. If the macro __STDC_UTF_16__ is defined in <cuchar>, values of type _Char16_t shall have UTF-16 encoding, as defined by ISO 10646. Similarly, if the macro __STDC_UTF_32__ is defined in <cuchar>, values of type _Char32_t shall have UTF-32 encoding, as defined by ISO 10646.

The Standard Library will also provide _Char16_t and _Char32_t typedefs, in analogy to the typedefs wstring, wcout, etc., for the following standard classes:

· filebuf, streambuf, streampos, streamoff
· ios, istream, ostream
· fstream, ifstream, ofstream
· stringstream, istringstream, ostringstream
· string

Improvements in the Pipeline
In conclusion, C++ is about to be rid of another major embarrassment—the lack of native Unicode support—soon. The said proposal is now being added to the Working Paper, which means that it will be incorporated into C++09. It's time to get ready!

Danny Kalev is a certified system analyst and software engineer specializing in C++. He was a member of the C++ standards committee between 1997 and 2000 and has since been involved informally in the C++0x standardization process. He is the author of "The ANSI/ISO Professional C++ Programmer's Handbook" and "The Informit C++ Reference Guide: Techniques, Insight, and Practical Advice on C++."
Email AuthorEmail Author
Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date