msort man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]


MSORT(1)			 User Commands			      MSORT(1)

NAME
       msort - sort records in complex ways

SYNOPSIS
       msort <options> [<input file>]

DESCRIPTION
       msort  is  a  program for sorting text files in sophisticated ways.  It
       was developed initially for alphabetizing dictionaries of languages  in
       which  the  ordering  may  be quite different from English but has many
       other uses.

       msort allows you to sort blocks of text delimited in a number  of  ways
       rather  than just lines and to specify particular fields of a record as
       sort keys using either their position, counted from either end,	or  by
       matching regular expressions to their tags.

       msort  is capable of sorting on multiple keys, so that when two records
       tie on one key, the tie may be broken on another. Any or all  keys  may
       be  optional.   How  absent  optional  keys are ordered with respect to
       present keys may be set separately for each key.

       msort allows you to specify arbitrary sort orders and to define	virtu‐
       ally  unlimited numbers of multigraphs of effectively unlimited length.
       The sort order and multigraphs are defined separately for each key.  If
       your system has locale support, you can also use locale collation rules
       instead of specify your own sort order.

       msort provides twelve types of key comparison: lexicographic,  numeric,
       numeric	string, hybrid, by string length, by angle, by date, by domain
       name, by time, by ISO8601 date/time stamp, by month name, and random.

       What month names are used is a bit complicated. If the -s flag is  used
       on the same key and its argument is the name of a file, the month names
       are read from the file, which should be in the same format  as  a  sort
       order  definition  file.	 If  the -s flag is used and its argument is a
       locale name, the month names recognized will be	the  month  names  and
       abbreviations  associated  with the specified locale. If the -s flag is
       not used the month names recognized will be the month names and	abbre‐
       viations	 associated  with  the current locale. If your system does not
       have locale support and you do not use the -s flag to  read  the	 month
       names from a file, the month names recognized will be the English month
       names and abbreviations.

       msort can reverse the characters in a key, allowing it to  be  used  to
       generate reverse dictionaries.

       A choice of sorting algorithms is provided.

       msort fully supports Unicode. The text to be sorted, and all specifica‐
       tions, should be in UTF-8 Unicode. (If you have plain ASCII text,  this
       is  not	a problem as ASCII is a subset of Unicode.) Full Unicode case-
       folding is available, in Turkic and non-Turkic variants.	 Unicode  nor‐
       malization is performed before sorting.

       For usage information, execute msort with no arguments.

       Full  information about msort is currently to be found in the reference
       manual, which is distributed as a PDF (Portable Document Format)	 file.
       If  a  copy  is not available locally, you can download it from msort's
       home page:
       http://billposer.org/Software/msort.html

OPTIONS
   Informational options
       -h,--help
	      Print usage message

       -v,--version
	      Print version message

       -D,--defaults
	      List defaults

       -F,--general-options
	      List general command line options

       -G,--gnu-equivalences
	      List equivalents for GNU sort command line options.

       -H,--informational-options
	      List informational command line options

       -K,--key-specific-options
	      List key-specific command line options

       -L,--limits
	      List limits

       -N,--number-systems
	      List the supported number systems.

   General options
       -b,--block
	      A record is terminated by two or more newlines

       -l,--line
	      A record consists of a single line

       -r,--record-separator <separator>
	      A record is terminated by separator character

       -O,--fixed-size-record <bytes>
	      A record consists of the specified number of bytes.

       -d,--field-separators <character>+
	      Fields are delimited by the named character(s)

       -w,--whole
	      Sort on the entire text of the record

       -a,--algorithm <algorithm>
	      Use the specified sort algorithm. The choices  are:  I(nsertion‐
	      Sort),  M(ergeSort),  Q(uickSort),  and  S(hellSort).  Note that
	      InsertionSort and MergeSort  are	stable,	 while	QuickSort  and
	      ShellSort are unstable. The default is QuickSort.

       -M,-initial-maximum-records <records>
	      Set initial maximum number of records

       -m,--line-end-carriage-return
	      End-of-line  in  the  input  data	 is  marked by Carriage Return
	      (0x0D) as on the Macintosh rather than by Line Feed (0x0A) as on
	      Unix systems.

       -I,--invert-globally
	      Invert sense of comparisons globally

       -B,--BMP
	      No  characters fall outside the Basic Multingual Plane (that is,
	      have values greater than 0xFFFF).

       -Z,--skip-first-record
	      Copy the first record in the input to the output without sorting
	      it. This is useful for sorting files with a header.

       -p,--reserve-private-use-area
	      Do  not  make internal use of the Private Use areas. By default,
	      multigraphs are assigned internally to codepoints in the Supple‐
	      mentary  Private Use areas if full Unicode is in use or to code‐
	      points in the Private Use area if input  is  restricted  to  the
	      Basic  Multilingual  Plane  by  means  of the -B option. If your
	      input makes use of the Private Use areas, this  option  prevents
	      interference  with your input. In this case, multigraphs will be
	      assigned to the Low and High  Surrogate  areas  (0xD800-0xDFFF).
	      Note that this limits the number of multigraphs to 2,048.

       -P,--random-seed <seed>
	      Set  the	seed for the random number generator. If not set here,
	      it is set to a value determined by the time. The	seed  used  is
	      reported in the log. This option allows runs to be replicated.

       -Q,--check-only
	      Check  whether  the input is already sorted. Do not generate any
	      output.  Exit status is 0 if input is already sorted, 11 if  not
	      sorted.

       -1,--in <input file name>

       -2,--out <output file name>
	      If the output file is the same as the input file, the input file
	      will be overwritten. The input file will not be  overwritten  if
	      the run is unsuccessful.

       -j,--suppress-log
	      Suppress	output	to the log. If this flag is given before there
	      is any output to the log from a command line flag, nothing  will
	      be written to the log and the log file will not be created. If a
	      command line flag generates a log message before	this  flag  is
	      processed, the log file will be created but no log messages will
	      be written to it once this flag is processed. To guarantee  that
	      no  attempt  will	 be  made  to  open a log file, give this flag
	      first.

       -q,--quiet
	      Be quiet - do not chat while working

       -u,--unicode-normalization <mode>
	      Select Unicode normalization mode. The choices of	 mode  are:  c
	      for  normalization  form	C  (NFC),  d  for normalization form D
	      (NFD), C for normalization form KC (NFKC), D  for	 normalization
	      form KD (NFKD), and n for no normalization. The default is NFC.

   Key specific options
       -e,--character-range <m,n>
	      Sort on characters m through n. Positive indices start from one.
	      Negative indices indicate position with respect to  the  end  of
	      the  record.   For example, the range 3,-2 consists of the third
	      character through the next-to-last character.

       -n,--position <POS>(,<POS>)
	      Sort on the specified POS or contiguous range of POSs,  where  a
	      POS  is  of  the	form <field number>(.<character number>). Both
	      counts begin at one.  Field numbers but  not  character  numbers
	      may  be negative, in which case they are counted from the right.
	      Thus, 1.2 is the second character of the first  field;  -2.1  is
	      the first character of the next to last field.

       -t,--tag <tag regexp>
	      Sort on the field with the specified tag

       -o,--optional <comparison>
	      Optional: compare as (<,=,>) to present key if absent

       -C,--fold-case
	      Fold case

       -z,--fold-case-turkic
	      Fold case with additional Turkic conversions.

       -c,--comparison-type <comparison type>
	      a(ngle),l(exicographic),	i(so8601  date/time),  t(ime), D(omain
	      name/email address), d(ate), m(onth name),  n(umeric),  N(umeric
	      string),s(ize), h(hybrid), r(andom)

       -y,--number-system <number system>
	      Specifies	 the number system expected for this key. This affects
	      only numeric and numeric string keys. There are two special val‐
	      ues. If the number system is "all", records may contain any num‐
	      ber system that msort can interpret. Different records may  con‐
	      tain  different  number systems.	If the number system is "any",
	      records may contain any writing system that msort can interpret,
	      but  all records must make use of the same number system.	 msort
	      sets the number system on the basis of the first record.

       -f,--date-format <date format>
	      Permutation of ymd with separators, e.g. y-m-d for international
	      date format, m/d/y for American date format, or a permutation of
	      yd with separators, e.g. y-d, for day-of-year dates.  All	 three
	      components  may  be  numbers in any available number system. The
	      month field may also be a month name,  determined	 by  the  same
	      devices as independent month name fields.

       -W,--sort-order-file-separators <file name>
	      Read  the	 list of characters to be treated as separators in the
	      sort order definition file.

       -S,--substitutions <file name>
	      Read substitutions from named file

       -s,--sort-order <file name>|<locale name>|"locale"
	      If the argument is a file name, it is taken to be a  sort	 order
	      file  and	 the  sort order for the key is read from the file. If
	      the argument is a locale name,  the  collation  rules  for  that
	      locale  are  used.  If  the  argument is "locale", the collation
	      rules for the current locale are used.

       -T,--transformations <(d)(e)(s)>
	      Apply the specified transformations.  d specifies that  diacrit‐
	      ics  are to be stripped. Separately encoded combining diacritics
	      are removed. Characters with  diacritics	represented  by single
	      codepoints  are  replaced with the corresponding ASCII character
	      without the diacritics, if  there	 is  one.   e  specifies  that
	      enclosed	characters,  that  is,	characters  within  circles or
	      parentheses, are to be replaced  with  the  corresponding	 plain
	      ASCII character if there is one.	s specifies that characters in
	      special styles are to be replaced with the  corresponding	 plain
	      ASCII  character if there is one. Stylistic equivalents include:
	      small capitals (e.g. U+1D04), script forms (e.g. U+212C),	 black
	      letter  forms  (e.g.  U+212D),  Arabic  presentation forms (e.g.
	      U+FE81), Hebrew  presentation  forms  (e.g.  U+FB1D),  fullwidth
	      forms  (e.g.  U+FF01),  halfwidth	 forms	(e.g. U+FF7B), and the
	      mathematical alphanumeric symbols (e.g. U+1D400).

       -x,--exclusion-file <file name>
	      Read exclusions from named file

       -X,--exclude-characters <exclusions>
	      Exclude specified characters

       -i,--invert-locally
	      Invert sense of comparisons

       -R,--reverse-key
	      Reverse characters of key

       -A,--first-character-only
	      Ignore all but the first character of the field, after substitu‐
	      tions, exclusions, etc.

       Note: long options may not be available on your system.

SEE ALSO
       sort(1), uninum(3)

AUTHOR
       Bill Poser (billposer@alum.mit.edu)

LICENSE
       GNU General Public License (http://www.gnu.org/licenses/gpl.html), ver‐
       sion 3.

msort				 January 2010			      MSORT(1)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net