dechanzi(5)dechanzi(5)NAMEdechanzi - A character encoding system (codeset) for Simplified Chinese
DESCRIPTION
The DEC Hanzi (dechanzi) codeset consists of the following character
sets: ASCII GB2312-80 Extended GB
DEC Hanzi uses a 2-byte data representation for symbols and ideographic
characters that are defined in GB2312-80.
ASCII Characters
All ASCII characters are represented in the form of single-byte, 7-bit
data in the DEC Hanzi codeset; that is, the most significant bit (MSB)
of the byte that represents an ASCII character is always set off. For
more information on ASCII characters, refer to ascii(5).
GB2312-80 Characters
The code table for GB2312-80 characters is divided into 94 rows(Qu),
numbered from 1 to 94. Each row has 94 columns(Wei), also numbered from
1 to 94. The code table defines a total of 7445 characters, of which
6763 are Chinese characters. Chinese characters are grouped as follows:
Graphic symbols
There are 682 graphic symbols, which occupy rows 1 to 9 in the
code table. Frequently used (Level 1) characters
There are 3755 frequently used characters, which occupy rows 16
to 55 in the code table. Less frequently used (Level 2) charac‐
ters
There are 3008 less frequently used characters, which occupy
rows 56-87 in the code table.
To differentiate GB2312-80 character codes from ASCII and Extended GB
character codes, the most significant bit (MSB) of both the first byte
and the second byte are set on. The following formulas show how to cal‐
culate the value for a GB2312-80 character from its row and column num‐
bers:
1st byte = A0 + Row number
2nd byte = A0 + Column number
For example, if a GB2312-80 character is in the first column of the
16th row, the character's value is B0A1, which is calculated as fol‐
lows:
1st byte = A0(hex) + 16 = B0(hex)
2nd byte = A0(hex) + 01 = A1(hex)
Extended GB Characters
The Extended GB code table is similar to the GB2312 code table and is
divided into 94 rows and 94 columns (8894 code points). However, the
Extended GB code table provides code points for user-defined characters
(UDC). The 8836 code points in this table are divided into two areas:
User-defined area
This area spans rows 1 to 87 and provides 8178 code points.
User-defined (reserved) area
This area spans rows 88 to 94 and provides 658 code points. This
area is where users can define special and long-lasting user-
defined characters.
To differentiate Extended GB codes from ASCII codes and GB2312-80
codes, the most significant bit (MSB) of the first byte is set on while
that of the second byte is set off. The following formulas show how the
code value of an Extended GB character is calculated from its row and
column numbers:
1st byte = A0 + Row number
2nd byte = 20 + Column number
For example, if a character is positioned at the first column of the
16th row on the GB2312-80 code plane, the character's value is B021,
which is calculated as follows:
1st byte = A0(hex) + 16 = B0(hex)
2nd byte = 20(hex) + 01 = 21(hex)
Codeset Conversion
The following codeset converter pairs are available for converting Sim‐
plified Chinese characters between dechanzi and other encoding formats.
Refer to iconv_intro(5) for an introduction to codeset conversion. For
more information about the other codeset for which dechanzi is the
input or output, see the reference page specified in the list item.
big5_dechanzi, dechanzi_big5
Converting from and to the Big-5 codeset: big5(5)
dechanyu_dechanzi, dechanzi_dechanyu
Converting from and to the DEC Hanyu codeset: dechanyu(5)
eucTW_dechanzi, dechanzi_eucTW
Converting from and to Taiwanese Extended UNIX Code: eucTW(5)
UTF-16_dechanzi, dechanzi_UTF-16
Converting from and to UTF-16 format: Unicode(5) UCS-4_dechanzi,
dechanzi_UCS-4
Converting from and to UCS-4 format: Unicode(5) UTF-8_dechanzi,
dechanzi_UTF-8
Converting from and to UTF-8 format: Unicode(5)
DEC Hanzi encoding is identical to the Microsoft code-page format
(cp936) used for Simplified Chinese characters on PC systems. However,
DEC Hanzi supports fewer characters than supported by the code page.
Therefore, using converters with dechanzi in the converter name to con‐
vert between cp936 and other formats can result in some data loss.
Refer to code_page(5) for more information about PC code pages.
DEC Hanzi Fonts
The operating system provides both screen and printer fonts for DEC
Hanzi characters. The operating system also provides bit map fonts in
addition to the TrueType fonts described in this section. For a com‐
plete description of DEC Hanzi fonts, see the document, Technical Ref‐
erence for Using Chinese Features.
The following set of Simplified Chinese TrueType fonts are installed as
the operating system default fonts for DEC Hanzi: -css_dongwen-fang‐
song-medium-r-normal--0-0-0-0-c-0-gb2312.1980-0 -css_dongwen-fangsong-
medium-r-normal--0-0-0-0-c-0-gb2312.1980-1 -css_dongwen-fangsong-
medium-r-normal--0-0-0-0-c-0-iso8859-1
-css_dongwen-heiti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-0
-css_dongwen-heiti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-1
-css_dongwen-heiti-medium-r-normal--0-0-0-0-c-0-iso8859-1
-css_dongwen-kaiti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-0
-css_dongwen-kaiti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-1
-css_dongwen-kaiti-medium-r-normal--0-0-0-0-c-0-iso8859-1
-css_dongwen-songti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-0
-css_dongwen-songti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-1
-css_dongwen-songti-medium-r-normal--0-0-0-0-c-0-iso8859-1
The following set of Simplified Chinese TrueType fonts are available as
an installation option: -huatian-fangsong-medium-r-nor‐
mal--0-0-0-0-c-0-gb2312.1980-0 -huatian-fangsong-medium-r-nor‐
mal--0-0-0-0-c-0-gb2312.1980-1 -huatian-fangsong-medium-r-nor‐
mal--0-0-0-0-m-0-iso8859-1
-huatian-heiti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-0-huatian-
heiti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-1 -huatian-heiti-medium-
r-normal--0-0-0-0-m-0-iso8859-1
-huatian-kaiti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-0-huatian-
kaiti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-1 -huatian-kaiti-medium-
r-normal--0-0-0-0-m-0-iso8859-1
-huatian-songti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-0-huatian-
songti-medium-r-normal--0-0-0-0-c-0-gb2312.1980-1 -huatian-songti-
medium-r-normal--0-0-0-0-m-0-iso8859-1
With either the default or optional font sets installed, the SongTi
fonts are the default screen fonts for the DEC Hanzi codeset.
The operating system provides the following PostScript printer fonts
for DEC Hanzi characters: Hei-GB2312-80 XiSong-GB2312-80
For general information on printing Asian language text, refer to
i18n_printing(5).
SEE ALSO
Commands: locale(1)
Others: ascii(5), big5(5), Chinese(5), code_page(5), dechanyu(5),
eucTW(5), GB18030(5), GBK(5), i18n_intro(5), i18n_printing(5),
iconv_intro(5), l10n_intro(5), sbig5(5), telecode(5), Unicode(5)dechanzi(5)