Qt(B5) About QString and UTF16

Published on| September 12th, 2009 | No Comment.

Categories:Qt |

Did you disable Javascript with your web browser ?
If you disabled it, you will not be able to see any letters of content, or you will not be able to search this site, or you will not be able to send comment.

Summary:
This time, Qt is a story about the QString.
QString is a class action allows a very useful text, QWidget with a string of class inheritance, this former QChar QString QString or do you have in common.

Does, QString is real, what is it?

If you create a Qt application in a closed world of the string, QString if all we Oke, and not a problem anything.
Also, for those who have no plans to create cross-platform applications, you may not matter much.

But for those who otherwise may become very important issue. For example, if another library such as Qt is a library to mix.
At such times, in different library that is what the string you are dealing.
C / C + + type used in general, char, wchar_t may be. STL's string, wstring might. If Windows, MFC's CString might.
QString if the dealing with, you will have a problem.
However, a different type as above, if you are using a class, Qt application to handle the conversion to QString always Tsukimatoimasu.

In such cases, QString and have little understanding of the nature of how we will help in programming.
Does, immediately, I would like to briefly explain.

Bit character code string, C / C + + let you organize the type of relationship.

C / C + + is a string, char type treats. Character code information is set as this type, ASCII is commonly used in the code. This is described as common and, strictly, many compilers, ASCII code to handle just that. For example, if the Japanese is, Windows does (there is little Gohei) ASCII or as an extended version of the code using the code ShiftJis. If you are a UNIX system, (there is little Gohei) ASCII or as an extension of the EUC code using the code.
In this way, char-type fit the character code information is not necessarily not already know. Confining it to the Japanese, even letters, that past, a variety of OS, it was there, ASCII also had other strings to set the compiler code.
Now, to note the character code is a specification of the compiler, C / C + + must be aware that non-language specification.

Continues, the definition of wide characters. Wide character, as the words say one character byte to represent more information. It is also often misunderstood, wide characters, UNICODE, and you must note that not.
Certainly, UTF-16, UTF-32 specifications of aging, and wide character. However, the reverse may not necessarily have to be careful.

Addition, C / C + + is a wide character type, wchar_t representation. C in the world, usually, short is assigned to, wchar_t is a reserved word from you will depend on the compiler.
In the world of Windows, wchar_t is treated as short as two bytes.
Also, Windows only given the world of wide characters, UNICODE (UTF-16) also believe the same, in most cases, you may not be a problem. (This is a source of confusion. )
However, other OS, and that it is not always the destination string (char) I think you understand the story. Addition, wchar_t, and other OS but I do not know whether two characters.

Given this fact, the literal description of the following cases, I can understand that very issue.

1
2

char     c1 = "abcde";
wchar_t  c2 = L"abcde";

In this description, c2 is, UNICODE (UTF-16) you may want to go to. However, Windows outside, it is not always guaranteed.
(L "xxxx" is described, which means wide-character string. As described earlier, strings are always UTF-16 with wide must be careful that you do not. )

In fact, C + +0 x (the future of C + + standard) does, char16_t (2-byte string), char32_t (4-byte string) is a reserved word that is new. Also, literal, UNICODE if you set the specifications are added to perform the following description.

char     c1 = 'a';
wchar_t  c2 = L'a';
char16_t c3 = u'a';
char32_t c4 = U'a';

This specification, Visual Studio 2010 will be included in Yes.

QString internal character code that is covered in what what?

Answer
UNICODE (UTF-16) is.

This is, Qt classes may also include reference.
However, endian, QString does not mention.

Endian UTF-16 is, what are you adopted one?

Answer
(As normal endian) System(OS) dependent.

If you want to see clearly is a good idea to check the following header.

% Qt installation directory% \ qt \ src \ corelib \ global \ qconfig.h

/* Machine byte-order */
#define Q_BIG_ENDIAN 4321
#define Q_LITTLE_ENDIAN 1234
#define Q_BYTE_ORDER Q_LITTLE_ENDIAN
        ^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^

Q_BYTE_ORDER named definition, Q_BIG_ENDIAN / Q_LITTLE_ENDIAN are defined either.
(Here, Q_LITTLE_ENDIAN We have a defined, of course, UTF-16LE (little endian) is. )

※ If you are dealing with internal, BOM do not receive grades.

Other types, if you convert from QString class, the most efficient way, either way you like?

Here, Windows based on the experience of talking. Therefore, little, if demonstrated in other OS may be different.

Answer
QString uses, static in the method, provides various conversion patterns.
If you need to convert from another character code, QString the earliest you should use a static method.
Reference: To convert from using the ICU library, QString the static method (fromLocal8Bit) is quicker with much better times. (ShiftJis from the case of conversion to UTF-16)

UTF-16 if the same, that address (short *) the number of characters, QString the setUtf16 (const ushort *, int) will be quicker to set.
(If the same code, static methods (fromUtf16) from a set, setUtf16 You'd better be careful that you've been quicker to set in. )

20 Japanese characters (ShiftJis) to UTF-16 to 1 million times the speed of the measurement results when converted

Using libraries	Using function (method)	Measurement data (in seconds)
Windows Api	MultiByteToWideChar	1.7
Qt Library	QString:: fromLocal8Bit	2.0
icu library	ucnv_toUnicode ※ subroutine process so simple, every time, ucnv_open / ucnv_close has been conducted.	7.2
icu library	ucnv_convert	7.4

Yappari, Windows is in, Windows is to use the API, it is obviously a better choice.
However, surprisingly, ICU conversion process is that I'm slow. (Which is what I tried to check with the wrong programming, ucnv_open -> ucnv_toUnicode -> ucnv_close flow because of mistakes but I think I have. )

Still, Qt library is a better performance.
And, if impressed, internally, WIN32 if, WindowsAPI was used without modification. It is just a wrapper from the minute that we are wrapping it, you'd just slow.

Comments

Site Description

This site is support & information site of WEB,and Software. This site might help you that create software or Web Site…perhaps?[:]

Qt(B5) About QString and UTF16

You might also like:

Comments

Recent Posts

Contents

Site Description