charmap(4)charmap(4)NAMEcharmap - Defines character symbols as character encodings
DESCRIPTION
The character set description (charmap) file defines character symbols
as character encodings. This file is the source file for a coded char‐
acter set, or codeset. All supported codesets have the Portable Charac‐
ter Set (PCS) as a proper subset. The PCS consists of the following
character symbols (listed by their standardized symbolic names) and
hexadecimal encodings:
───────────────────────────────────────────
Symbol Name Hexadecimal Encoding
───────────────────────────────────────────
<NUL> \x00
<SOH> \x01
<STX> \x02
<ETX> \x03
<EOT> \x04
<ENQ> \x05
<ACK> \x06
<alert> \x07
<backspace> \x08
<tab> \x09
<newline> \x0A
<vertical-tab> \x0B
<form-feed> \x0C
<carriage-return> \x0D
<SO> \x0E
<SI> \x0F
<DLE> \x10
<DC1> \x11
<DC2> \x12
<DC3> \x13
<DC4> \x14
<NAK> \x15
<SYN> \x16
<ETB> \x17
<CAN> \x18
<EM> \x19
<SUB> \x1A
<ESC> \x1B
<IS4> \x1C
<IS3> \x1D
<IS2> \x1E
<IS1> \x1F
<space> \x20
<exclamation-mark> \x21
<quotation-mark> \x22
<number-sign> \x23
<dollar-sign> \x24
<percent> \x25
<ampersand> \x26
<apostrophe> \x27
<left-parenthesis> \x28
<right-parenthesis> \x29
<asterisk> \x2A
<plus-sign> \x2B
<comma> \x2C
<hyphen> \x2D
<period> \x2E
<slash> \x2F
<zero> \x30
<one> \x31
<two> \x32
<three> \x33
<four> \x34
<five> \x35
<six> \x36
<seven> \x37
<eight> \x38
<nine> \x39
<colon> \x3A
<semi-colon> \x3B
<less-than> \x3C
<equal-sign> \x3D
<greater-than> \x3E
<question-mark> \x3F
<commercial-at> \x40
<A> \x41
<B> \x42
<C> \x43
<D> \x44
<E> \x45
<F> \x46
<G> \x47
<H> \x48
<I> \x49
<J> \x4A
<K> \x4B
<L> \x4C
<M> \x4D
<N> \x4E
<O> \x4F
<P> \x50
<Q> \x51
<R> \x52
<S> \x53
<T> \x54
<U> \x55
<V> \x56
<W> \x57
<X> \x58
<Y> \x59
<Z> \x5A
<left-bracket> \x5B
<backslash> \x5C
<right-bracket> \x5D
<circumflex> \x5E
<underscore> \x5F
<grave-accent> \x60
<a> \x61
<b> \x62
<c> \x63
<d> \x64
<e> \x65
<f> \x66
<g> \x67
<h> \x68
<i> \x69
<j> \x6A
<k> \x6B
<l> \x6C
<m> \x6D
<n> \x6E
<o> \x6F
<p> \x70
<q> \x71
<r> \x72
<s> \x73
<t> \x74
<u> \x75
<v> \x76
<w> \x77
<x> \x78
<y> \x79
<z> \x7A
<left-brace> \x7B
<vertical-line> \x7C
<right-brace> \x7D
<tilde> \x7E
<DEL> \x7F
───────────────────────────────────────────
The charmap file has the following components: An optional special sym‐
bolic name declarations section
Each declaration in this section consists of a special symbolic
name, followed by one or more space or tab characters, and a
value. The following list describes the special symbolic names
that you can include in the declarations section: Specifies the
name of the codeset for which the charmap file is defined. This
value determines the value returned by the nl_langinfo (CODESET)
subroutine. If <code_set_name> is not declared, the name for the
Portable Character Set is used. Specifies the maximum number of
bytes in a character for the codeset. Valid values are 1 to 4.
The default value is 1. Specifies the minimum number of bytes
in a character for the codeset. Since all supported codesets
have the Portable Character Set as a proper subset, this value
must be 1. Specifies the escape character that indicates encod‐
ings in hexadecimal or octal notation. The default value is a \
(backslash). Specifies the character used to indicate a comment
within a charmap file. The default value is a # (number sign).
The CHARMAP section header
This header marks the beginning of the section that associates
character symbols with encodings. Mapping statements for char‐
acters in the codeset
Each statement lists a symbolic name for a character and its
associated encoding. The format of a mapping statement is:
<char_symbol> encoding
A symbolic name begins with the < (left-angle bracket) character
and ends with the > (right-angle bracket) character. The char‐
acters for char_symbol (between < and >) can be any characters
from the Portable Character Set, except for control and space
characters. The right-angle bracket (>) can occur in char_symbol
as well in the last position of the name. You must precede all >
characters but the last one with the escape character (as speci‐
fied by the <escape_char> special symbolic name).
The format of a mapping statement is:
<char_symbol> encoding
An encoding is specified as one or more character constants,
with the maximum number of character constants specified by the
<mb_cur_max> special symbolic name. The encoding may be listed
as decimal, octal, or hexadecimal constants with the following
formats: \xxx, where x is a hexadecimal digit \ooo or \oo, where
o is an octal digit \dddd or \ddd, where d is a decimal digit
Some examples of character symbol definitions are the following:
<A> \d65 #decimal constant <B> \x42
#hexadecimal constant <j10101> \x81\xA1 #multiple hexadeci‐
mal constants
A range of symbolic names and corresponding encoded values may
also be defined, where the nonnumeric prefix for each symbolic
name is common, and the numeric portion of the second symbolic
name is equal to or greater than the numeric portion of the
first symbolic name. In this format, a symbolic name value con‐
sists of zero or more nonnumeric characters followed by an inte‐
ger of one or more decimal digits. This format defines a
series of symbolic names. For example, the string
<j0101>...<j0104> is interpreted as the <j0101>, <j0102>,
<j0103>, and <j0104> symbolic names, in that order.
In statements defining ranges of symbolic names, the encoded
value listed is the value for the first symbolic name in the
range. Subsequent symbolic names have encoded values in increas‐
ing order. For example:
<j0101>...<j0104> \d129\d254
The preceding statement is interpreted as follows:
<j0101> \d129\d254 <j0102> \d129\d255 <j0103> \d130\d0 <j0104>
\d130\d1
Although you cannot assign multiple encodings to one symbolic
name, you can create multiple names for one encoded value. This
is allowed because some characters have several common names.
For example, the "." character is called a period in some parts
of the world, and a full stop in others. Both names may appear
in the charmap. For example:
<period> \x2e <full-stop> \x2e
If used, comments must begin with the character specified by the
<comment_char> special symbolic name. When an entire line is a
comment, you must specify <comment_char> in the first column of
the line. The END CHARMAP trailer
This entry denotes the end of character map statements.
The following example is a portion of a possible charmap file:
CHARMAP <code_set_name> "ISO8859-1" <mb_cur_max> 1
<mb_cur_min> 1 <escape_char> \ <comment_char>
#
<NUL> \x00 <SOH> \x01 <STX>
\x02 <ETX> \x03 <EOT> \x04 <ENQ>
\x05 <ACK> \x06 <alert> \x07
<backspace> \x09 <tab> \x09 <newline>
\x0a <vertical-tab> \x0b <form-feed> \x0c <car‐
riage-return> \x0d END CHARMAP
FILES
Character set description (charmap) source files for supported locales.
The /usr/lib/nls/loc/charmaps directory does not exist when source
files for installed locales are not provided.
SEE ALSO
Commands: locale(1), localedef(1)
Files: locale(4)charmap(4)