PioneerPOS G24-LC 用户手册

下载
页码 444
Chapter 1:  Product Features
 April 15, 2008
G24-L AT Commands Reference Manual
1-13
Character Sets
The following includes the references to various tables that provide conversions between the 
different character sets.
CS1 - GSM to UCS2.
CS2 - ASCII to/from UTF8.
CS3 - UCS2 to/from UTF8.
For the full content of a specific conversion table, refer to Appendix A, Character Set Tables.
ASCII Character Set Management
The ASCII character set is a standard seven-bit code that was proposed by ANSI in 1963, and 
finalized in 1968. ASCII was established to achieve compatibility between various types of data 
processing equipment.
GSM Character Set Management
In G24-L, the GSM character set is defined as octant stream. This means that text is displayed not 
as GSM characters but in the hex values of these characters.
UCS2 Character Set Management
UCS2 is the first officially standardized coded character set, eventually to include the characters 
of all the written languages in the world, as well as all mathematical and other symbols.
Unicode can be characterized as the (restricted) 2-octet form of UCS2 on (the most general) 
implementation level 3, with the addition of a more precise specification of the bi-directional 
behavior of characters, as used in the Arabic and Hebrew scripts.
The 65,536 positions in the 2-octet form of UCS2 are divided into 256 rows with 256 cells in 
each. The first octet of a character representation denotes the row number, the second the cell 
number. The first row (row 0) contains exactly the same characters as ISO/IEC 8859-1. The first 
128 characters are thus the ASCII characters. The octet representing an ISO/IEC 8859-1 character 
is easily transformed to the representation in UCS2 by placing a 0 octet in front of it. UCS2 
includes the same control characters as ISO/IEC 8859 (also in row 0).
UTF-8 Character Set Management
UTF-8 provides compact, efficient Unicode encoding. The encoding distributes a Unicode code 
value's bit pattern across one, two, three, or even four bytes. This encoding is a multi-byte 
encoding. 
UTF-8 encodes ASCII in a single byte, meaning that languages using Latin-based scripts can be 
represented with only 1.1 bytes per character on average. 
UTF-8 is useful for legacy systems that want Unicode support because developers do not have to 
drastically modify text processing code. Code that assumes single-byte code units typically does 
not fail completely when provided UTF-8 text instead of ASCII or even Latin-1.