Bit character code string, C / C + + let you organize the type of relationship.
C / C + + is a string, char type treats.
Character code information is set as this type, ASCII is commonly used in the code.
This is described as common and, strictly, many compilers, ASCII code to handle just that.
For example, if the Japanese is, Windows does (there is little Gohei) ASCII or as an extended version of the code using the code ShiftJis.
If you are a UNIX system, (there is little Gohei) ASCII or as an extension of the EUC code using the code.
In this way, char-type fit the character code information is not necessarily not already know.
Confining it to the Japanese, even letters, that past, a variety of OS, it was there, ASCII also had other strings to set the compiler code.
Now, to note the character code is a specification of the compiler, C / C + + must be aware that non-language specification.
Continues, the definition of wide characters.
Wide character, as the words say one character byte to represent more information.
It is also often misunderstood, wide characters, UNICODE, and you must note that not.
Certainly, UTF-16, UTF-32 specifications of aging, and wide character.
However, the reverse may not necessarily have to be careful.
Addition, C / C + + is a wide character type, wchar_t representation.
C in the world, usually, short is assigned to, wchar_t is a reserved word from you will depend on the compiler.
In the world of Windows, wchar_t is treated as short as two bytes.
Also, Windows only given the world of wide characters, UNICODE (UTF-16) also believe the same, in most cases, you may not be a problem.
(This is a source of confusion.
)
However, other OS, and that it is not always the destination string (char) I think you understand the story.
Addition, wchar_t, and other OS but I do not know whether two characters.
Given this fact, the literal description of the following cases, I can understand that very issue.
1
2
| char c1 = "abcde";
wchar_t c2 = L"abcde";
|
In this description, c2 is, UNICODE (UTF-16) you may want to go to.
However, Windows outside, it is not always guaranteed.
(L "xxxx" is described, which means wide-character string.
As described earlier, strings are always UTF-16 with wide must be careful that you do not.
)
In fact, C + +0 x (the future of C + + standard) does, char16_t (2-byte string), char32_t (4-byte string) is a reserved word that is new.
Also, literal, UNICODE if you set the specifications are added to perform the following description.
1
2
3
4
| char c1 = 'a';
wchar_t c2 = L'a';
char16_t c3 = u'a';
char32_t c4 = U'a';
|
This specification, Visual Studio 2010 will be included in Yes.
QString internal character code that is covered in what what?
Answer UNICODE (UTF-16) is.
This is, Qt classes may also include reference.
However, endian, QString does not mention.
Endian UTF-16 is, what are you adopted one?
Answer (As normal endian) System(OS) dependent.
If you want to see clearly is a good idea to check the following header.
% Qt installation directory% \ qt \ src \ corelib \ global \ qconfig.h
1
2
3
4
5
| /* Machine byte-order */
#define Q_BIG_ENDIAN 4321
#define Q_LITTLE_ENDIAN 1234
#define Q_BYTE_ORDER Q_LITTLE_ENDIAN
^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^
|
Q_BYTE_ORDER named definition, Q_BIG_ENDIAN / Q_LITTLE_ENDIAN are defined either.
(Here, Q_LITTLE_ENDIAN We have a defined, of course, UTF-16LE (little endian) is.
)
※ If you are dealing with internal, BOM do not receive grades.
Other types, if you convert from QString class, the most efficient way, either way you like?
Here, Windows based on the experience of talking.
Therefore, little, if demonstrated in other OS may be different.
Answer QString uses, static in the method, provides various conversion patterns.
If you need to convert from another character code, QString the earliest you should use a static method.
Reference:
To convert from using the ICU library, QString the static method (fromLocal8Bit) is quicker with much better times.
(ShiftJis from the case of conversion to UTF-16)
UTF-16 if the same, that address (short *) the number of characters, QString the setUtf16 (const ushort *, int) will be quicker to set.
(If the same code, static methods (fromUtf16) from a set, setUtf16 You'd better be careful that you've been quicker to set in.
)
20 Japanese characters (ShiftJis) to UTF-16 to 1 million times the speed of the measurement results when converted
Using libraries |
Using function (method) |
Measurement data (in seconds) |
Windows Api |
MultiByteToWideChar |
1.7 |
Qt Library |
QString:: fromLocal8Bit |
2.0 |
icu library |
ucnv_toUnicode ※ subroutine process so simple, every time, ucnv_open / ucnv_close has been conducted.
|
7.2 |
icu library |
ucnv_convert |
7.4 |
Yappari, Windows is in, Windows is to use the API, it is obviously a better choice.
However, surprisingly, ICU conversion process is that I'm slow.
(Which is what I tried to check with the wrong programming, ucnv_open -> ucnv_toUnicode -> ucnv_close flow because of mistakes but I think I have.
)
Still, Qt library is a better performance.
And, if impressed, internally, WIN32 if, WindowsAPI was used without modification.
It is just a wrapper from the minute that we are wrapping it, you'd just slow.
Leave a Reply