iconv_ibmkanji(5)iconv_ibmkanji(5)NAMEiconv_ibmkanji - Specification for controlling conversion between IBM
Kanji and Tru64 UNIX Japanese codesets
DESCRIPTION
The iconv utility supports the ability to convert the encoding of char‐
acters between IBM Kanji System Characters (IBM Kanji) and one of the
following Tru64 UNIX codesets: DEC Kanji, Super DEC Kanji, Japanese
EUC, or Shift JIS. You choose the type of conversion by specifying the
appropriate values for the utility's from-code and to-code parameters,
as follows:
─────────────────────────────────────────────────────
Type of Code Conversion from-code to-code
─────────────────────────────────────────────────────
IBM Kanji to DEC Kanji ibmkanji deckanji
IBM Kanji to Super DEC Kanji ibmkanji sdeckanji
IBM Kanji to Japanese EUC ibmkanji eucJP
IBM Kanji to Shift JIS ibmkanji SJIS
DEC Kanji to IBM Kanji deckanji ibmkanji
Super DEC Kanji to IBM Kanji sdeckanji ibmkanji
Japanese EUC to IBM Kanji eucJP ibmkanji
Shift JIS to IBM Kanji SJIS ibmkanji
─────────────────────────────────────────────────────
Conversion behavior for the following items is affected by the defini‐
tion of environment variables or profile entries in the user's environ‐
ment. For more information, see the “Environment Variables” and “Pro‐
file” sections. The UDC (User-Defined Character) mapping table that is
used for UDC conversion
This table must be an ASCII text file that contains UDC mapping
information. The table affects conversion of user-defined char‐
acters between the codesets. The EBCDIC to/from ISO code
(ASCII, JIS Roman characters) mapping table that is used for
conversion
This table must be ASCII text file that contains information on
how to map characters between EBCDIC and ISO code. The K-shift
code
This is a one- or two-byte hexadecimal code that marks the
beginning of Kanji mode. The A-shift code
This is a one- or two-byte hexadecimal code that marks the
beginning of EBCDIC mode. The status of the initial mode (Kanji
or EBCDIC) at the time iconv command starts or the first time
the iconv() function is called after calling the iconv_open()
function that initializes the converter in a program
The status keywords are either kanji_mode or ebcdic_mode. How
to treat undefined characters when these are detected in Kanji
mode
Specify this action by using one of the following keywords: Stop
codeset conversion. Output the undefined characters without any
processing and continue codeset conversion. Output padding
characters instead of the undefined characters and continue
codeset conversion. Ignore the undefined characters and con‐
tinue codeset conversion. The two-byte padding character used
in Kanji mode
This value is meaningful when replace is chosen for the process‐
ing of undefined characters in Kanji mode. Specify the padding
character by its hexadecimal value. How to treat undefined
characters when these are detected in EBCDIC mode
Specify this action by using one of the following keywords: Stop
codeset conversion. Output the undefined characters without any
processing and continue codeset conversion. Output padding
characters instead of the undefined characters and continue
codeset conversion. Ignore the undefined characters and con‐
tinue codeset conversion. The one-byte padding character used
in EBCDIC mode
This value is meaningful when replace is chosen for the process‐
ing of undefined characters in EBCDIC mode. Specify the padding
character by its hexadecimal value.
When the to-code parameter for the conversion is ibmkanji, you can also
specify the following items for conversion behavior: Whether the ini‐
tial shift code is output at the start of conversion if the status of
the initial mode (Kanji or EBCDIC) is different from the mode of the
first input character
The start of conversion is the time the iconv utility starts
processing, or when the iconv() function is called just after
opening the converter with iconv_open(). Keyword values for this
item are yes or no. Whether or not the utility outputs the last
shift code when iconv() is called with a zero length input
string, and the current mode (Kanji or EBCDIC) is different from
the mode specified by the last shift state
Keyword values for this item are yes or no. The last status
(Kanji mode or EBCDIC mode)
Specify kanji_mode or ebcdic_mode for this value. It is meaning‐
ful only when yes is the setting for whether the utility outputs
the last shift code.
If the items that control conversion behavior are specified by both
environment variables and the profile file, values set by environment
variables override values set by comparable entries in the profile.
Note that values for all conversion control items are case-sensitive,
whether they are set by environment variables or in the profile. The
following table contains the default values for each conversion control
item:
────────────────────────────────────────────────────
Conversion Control Item Default Value
────────────────────────────────────────────────────
UDC mapping table None
K shift code 0x0e
A shift code 0x0f
Initial state ebcdic_mode
Processing for undefined characters
in Kanji mode abort
Processing for undefined characters
in EBCDIC mode pass
────────────────────────────────────────────────────
The default padding characters are white spaces, whose code values for
each destination codeset are noted in the following table. These pad‐
ding characters are output when you specify replace for processing of
undefined characters and do not explicitly specify the padding charac‐
ter.
───────────────────────────────────────────────────
Mode Default Value Destination Codeset
───────────────────────────────────────────────────
Kanji mode 0x44e9 ibmkanji
0xa1a1 deckanji, sdeckanji,
or eucJP
0x8140 SJIS
EBCDIC mode 0x40 ibmkanji
0x20 deckanji, sdeckanji,
eucJP, or SJIS
───────────────────────────────────────────────────
The default EBCDIC-ISO mapping table is as follows; For conversion from
IBM Kanji to other codesets:
/usr/lib/nls/loc/iconv/data/ebcdic_kana.tbl For conversion from other
codesets to IBM Kanji: /usr/lib/nls/loc/iconv/data/kana_ebcdic.tbl
These mapping tables map both EBCDIC and ISO code, which includes JIS
Roman characters. The kana_ebcdic.tbl mapping table also maps ISO low‐
ercase characters to EBCDIC uppercase characters.
The following default values for conversion control items are meaning‐
ful when the iconv utility's to-code conversion parameter is ibmkanji:
─────────────────────────────────────────────
Conversion Control Item Default
─────────────────────────────────────────────
Output the initial shift code? yes
Output the last shift code? yes
Output the last status? ebcdic_mode
─────────────────────────────────────────────
Environment Variables
This section discusses the environment variables that you can set to
control conversion behavior. The names for these variables adhere to
the following format:
fromcode_tocode_controlitem
The name segments for fromcode or tocode can be one of the following
key words:
────────────────────────────
For Codeset: Use:
────────────────────────────
IBM Kanji IBMKANJI
DEC Kanji DECKANJI
Super DEC Kanji SDECKANJI
Japanese EUC EUCJP
Shift JIS SJIS
────────────────────────────
The name segments for controlitem can be one of the following keywords:
────────────────────────────────────────────────────────
For Control Item: Use:
────────────────────────────────────────────────────────
UDC mapping table UDC_TABLE
EBCDIC-ISO mapping table EBCDIC_TABLE
K shift code K_SHIFT_CODE
A shift code A_SHIFT_CODE
Initial state INITIAL_STATE
Processing of undefined characters
in Kanji mode KANJI_EXCEPT_PROC
Processing of undefined characters
in EBCDIC mode EBCDIC_EXCEPT_PROC
Padding characters
in Kanji mode PADDING_2BYTE_CHAR
Padding characters
in EBCDIC mode PADDING_1BYTE_CHAR
Output initial
shift code INITIAL_SHIFT_CODE
Output last
shift code TRAILER_SHIFT_CODE
Last status LAST_STATE
File path of the profile PROFILE
────────────────────────────────────────────────────────
Following are examples of using the setenv C shell command to define
environment variables to control conversion behavior. In these exam‐
ples, the fromcode name segment indicates Japanese EUC and the tocode
name segment indicates IBM Kanji:
setenv EUCJP_IBMKANJI_UDC_TABLE eucjp_ibmkanji_udc.tbl setenv
EUCJP_IBMKANJI_EBCDIC_TABLE kana_ebcdic.tbl setenv
EUCJP_IBMKANJI_K_SHIFT_CODE 0x0e setenv EUCJP_IBMKANJI_A_SHIFT_CODE
0x0f setenv EUCJP_IBMKANJI_INITIAL_STATE ebcdic_mode setenv
EUCJP_IBMKANJI_KANJI_EXCEPT_PROC replace setenv
EUCJP_IBMKANJI_EBCDIC_EXCEPT_PROC replace setenv EUCJP_IBMKANJI_PAD‐
DING_2BYTE_CHAR 0x44e9 setenv EUCJP_IBMKANJI_PADDING_1BYTE_CHAR 0x40
setenv EUCJP_IBMKANJI_INITIAL_SHIFT_CODE yes setenv
EUCJP_IBMKANJI_TRAILER_SHIFT_CODE yes setenv EUCJP_IBMKANJI_LAST_STATE
ebcdic_mode setenv EUCJP_IBMKANJI_INITIAL_SHIFT_CODE yes setenv
EUCJP_IBMKANJI_TRAILER_SHIFT_CODE yes setenv EUCJP_IBMKANJI_LAST_STATE
ebcdic_mode setenv EUCJP_IBMKANJI_PROFILE .eucjp_ibmkanji_profile
Directory Search Path
When you specify a file name without a directory, the iconv utility
searches the following directories and uses the first file found: Cur‐
rent directory Home directory The iconv/data subdirectory of the direc‐
tory specified by the environment variable LOCPATH
/usr/lib/nls/loc/iconv/data /usr/i18n/lib/nls/loc/iconv/data
If you specify a relative directory path for a file, the utility
searches these same directories in the same order and uses the first
file found.
Profile File
Entry lines in the profile file adhere to the following format:
entry_name string_value
The entry_name and string_value fields are separated by spaces or tabs.
Do not append a colon (:) after entry_name. The file can also include
blank lines and comment entries, which begin with the # character.
Following are the entry_name values for different conversion control
items:
────────────────────────────────────────────────────────────
Conversion Control Item entry_name
────────────────────────────────────────────────────────────
UDC mapping table udc_mapping_table
EBCDIC-ISO mapping table ebcdic_mapping_table
K shift code k_shift_code
A shift code a_shift_code
Initial state initial_state
Processing undefined characters
in Kanji mode kanji_except_proc
Processing undefined characters
in EBCDIC mode ebcdic_except_proc
Padding character
in Kanji mode padding_2byte_char
Padding character
in EBCDIC mode padding_1byte_char
Output initial
shift code output_initial_shift_code
Output last
shift code output_trailer_shift_code
Last state last_state
────────────────────────────────────────────────────────────
Following is a sample profile for converting from Japanese EUC to IBM
Kanji.
# # sample profile for eucJP_ibmkanji # udc_mapping_table
eucjp_ibmkanji_udc.tbl ebcdic_mapping_table kana_ebcdic.tbl
k_shift_code 0x0e # ebcdic -> kanji a_shift_code
0x0f # kanji -> ebcdic initial_state ebcdic_mode
kanji_except_proc replace ebcdic_except_proc replace
padding_2byte_char 0x44e9 # kanji mode padding_1byte_char
0x40 # ebcdic mode output_initial_shift_code yes out‐
put_trailer_shift_code yes last_state ebcdic_mode
The default file names for the profile are as follows;
───────────────────────────────────────────────────────────
Code Conversion Default Profile Name
───────────────────────────────────────────────────────────
IBM Kanji to DEC Kanji .ibmkanji_deckanji_profile
IBM Kanji to Super DEC Kanji .ibmkanji_sdeckanji_profile
IBM Kanji to Shift JIS .ibmkanji_sjis_profile
IBM Kanji to Japanese EUC .ibmkanji_eucjp_profile
DEC Kanji to IBM Kanji .deckanji_ibmkanji_profile
Super DEC Kanji to IBM Kanji .sdeckanji_ibmkanji_profile
Shift JIS to IBM Kanji .sjis_ibmkanji_profile
Japanese EUC to IBM Kanji .eucjp_ibmkanji_profile
───────────────────────────────────────────────────────────
By default, the iconv utility checks the directory search path men‐
tioned in the "Directory Search Path" section and uses the first pro‐
file it finds. However, you can also specify an arbitrary file path for
your profile instead of the default names by defining the following
environment variables:
─────────────────────────────────────────────────────────────────
Code Conversion Profile Path Environment Variable
─────────────────────────────────────────────────────────────────
IBM Kanji to DEC Kanji IBMKANJI_DECKANJI_PROFILE
IBM Kanji to Super DEC Kanji IBMKANJI_SDECKANJI_PROFILE
IBM Kanji to Shift JIS IBMKANJI_SJIS_PROFILE
IBM Kanji to Japanese EUC IBMKANJI_EUCJP_PROFILE
DEC Kanji to IBM Kanji DECKANJI_IBMKANJI_PROFILE
Super DEC Kanji to IBM Kanji SDECKANJI_IBMKANJI_PROFILE
Shift JIS to IBM Kanji SJIS_IBMKANJI_PROFILE
Japanese EUC to IBM Kanji EUCJP_IBMKANJI_PROFILE
─────────────────────────────────────────────────────────────────
UDC Mapping Table
Entries in a UDC mapping table adhere to the following format:
fromcode tocode
Each of these values is a two-byte hexadecimal number. In the case of
Super DEC Kanji and Japanese EUC, three-byte hexadecimal values that
begin with SS3 (0x8f), such as 0x8fxxxx, are also valid.
You can specify ranges of UDC from and to values in the same file entry
by using a hyphen to separate the codes that start and end each range:
start_fromcode-end_fromcode start_tocode-end_tocode
When specifying entries that include ranges of values, the number of
codes in the from range must always equal the number of codes in the to
range. A UDC mapping table can also include blank lines and comment
lines, which begin with the # character. Following is an example of a
UDC mapping table:
# ibmkanji eucJP
0x6941-0x72fe 0xf5a1-0xfefe # udc 0x7341-0x7cfe
0x8ff5a1-0X8ffefe # udc 0x7d41-0x7ffe 0x8feea1-0X8ff0fe
# udc
The first entry in this file specifies a range of IBM Kanji values from
0x6941 to 0x72fe that are mapped to Japanese EUC code values in the
range 0xf5a1 to 0xfefe. You can find additional sample UDC mapping ta‐
ble files in the /usr/i18n/examples/iconv/data directory.
EBCDIC-ISO Mapping Table
Entries in an EBCDIC-ISO mapping table adhere to the following format:
fromcode tocode
Each code is a one-byte hexadecimal number. You can specify a range of
character codes as follows:
start_fromcode-end_fromcode start_tocode-end_tocode
When using the range format, the number of hex values in the from range
must be the same as the number of hex values in the to range.
The EBCDIC-/ISO mapping table can also include blank lines and comment
entries, which begin with the # character.
Following is an example of EBCDIC-ISO code mapping table:
# EBCDIC Kana
0x40 0x20 # space 0x4f
0x21 # '!' 0x7f 0x22 # '"'
. .
. .
. . 0xc1-0xc9 0x41-0x49 #
'A' - 'I' 0xd1-0xd9 0x4a-0x52 # 'J' - 'R' 0xe2-0xe9
0x53-0x5a # 'S' - 'Z'
. .
. .
. .
In this example, the first column of values are from codes and the sec‐
ond column of values are to codes. The first three value entry lines
specify mapping for single characters, whereas the last three value
entry lines specify mapping for ranges of characters. You can find
additional sample EBCDIC-ISO mapping tables in the
/usr/i18n/lib/nls/loc/iconv/data directory.
NOTES
This reference page contains code conversion specifications that apply
only to conversion between IBM Kanji System characters and the DEC
Kanji, Super DEC Kanji, Japanese EUC, and Shift JIS codesets. Refer to
iconv_JEF(5) for code conversion specifications between Fujitsu JEF
characters and the DEC Kanji, Super DEC Kanji, Japanese EUC, and Shift
JIS codesets. Refer to iconv_KEIS(5) for code conversion specifications
between Hitachi KEIS characters and the DEC Kanji, Super DEC Kanji, Ja‐
panese EUC, and Shift JIS codesets. Refer to iconv_intro(5) for infor‐
mation about conversion between DEC Kanji, Super DEC Kanji, Japanese
EUC, Shift JIS, and other Tru64 UNIX codesets.
SEE ALSO
Commands: iconv(1)
Functions: iconv(3), iconv_close(3), iconv_open(3)
Others: deckanji(5), eucJP(5), iconv_intro(5), iconv_JEF(5),
iconv_KEIS(5), Japanese(5), sdeckanji(5), SJIS(5)iconv_ibmkanji(5)