unidesc man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

unidesc(1)							    unidesc(1)

NAME
       unidesc - Describe the contents of a Unicode text file

SYNOPSIS
       unidesc ([option flags]) (<file name>)

       If  no  input  file  name  is supplied, unidesc reads from the standard
       input.

DESCRIPTION
       unidesc describes the content of a Unicode text file by	reporting  the
       character  ranges  to which different portions of the text belong.  The
       ranges reported include both  official  Unicode	ranges	and  the  con‐
       structed	 language  ranges within the Private Use Areas registered with
       the   Conscript	 Unicode    Registry	(http://www.evertype.com/stan‐
       dards/csur/).  For each range of characters, unidesc prints the charac‐
       ter or byte offset of the beginning of the range, the character or byte
       offset  of  the	end  of	 the range, and the name of the range. Offsets
       start from 0.

       Since the ASCII digits, punctuation, and whitespace characters are fre‐
       quently	used by other writing systems, by default these characters are
       treated as neutral, that is, as not belonging exclusively to  any  par‐
       ticular	character range.  These characters are treated as belonging to
       the range of whatever characters precede them.

       If the input begins  with  neutral  characters,	they  are  treated  as
       belonging  to the range of whatever characters follow them. If the file
       consists entirely of neutral characters, the  range  is	identified  as
       Neutral followed by Basic Latin in square brackets.

       A magic number identifying the Unicode encoding is not part of the Uni‐
       code standard, so pure Unicode files do not  contain  a	magic  number.
       However,	 informal  conventions	have  arisen for this purpose.	If the
       command line flag -m is given, unidesc will  attempt  to	 identify  the
       Unicode	subtype	 by examining the first few bytes of the input. If the
       input is identified as one of the two acceptable types, UTF-8 or native
       order  UTF-32,  it  will	 then  proceed to describe the contents of the
       input. Otherwise, it will report what it has  learned  and  exit.  Note
       that if the file does contain a magic number, you must use the -m flag.
       Without this flag unidesc assumes that the input consists of pure  Uni‐
       code  with the character data beginning immediately.  It will therefore
       be thrown off by the magic number.

       By default, input is expected to be UTF-8. Native order UTF-32 is  also
       acceptable.   UTF-32  may be specified via the command line flag -u or,
       if the command line flag -m is given, via the magic number.

COMMAND LINE FLAGS
       -b     Give file offsets in bytes rather than characters.

       -d     Treat the ASCII digits as belonging  exclusively	to  the	 Basic
	      Latin range.

       -h     Print usage information.

       -L     List the Unicode ranges alphabetically.

       -l     List the Unicode ranges by codepoint.

       -m     Check the file's magic number to determine the Unicode subtype.

       -p     Treat  ASCII  punctuation	 as belonging exclusively to the Basic
	      Latin range.

       -r     Instead of listing ranges as they are encountered, just list the
	      ranges detected after all input has been read.

       -u     Input is native order UTF-32.

       -v     Print version information.

       -w     Treat  ASCII  whitespace	as  belonging exclusively to the Basic
	      Latin range.

SEE ALSO
       uniname

REFERENCES
       Unicode Standard, version 5.0

AUTHOR
       Bill Poser
       billposer@alum.mit.edu

LICENSE
       GNU General Public License

				  June, 2007			    unidesc(1)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net