rwsort man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

rwsort(1)			SiLK Tool Suite			     rwsort(1)

NAME
       rwsort - Sort SiLK Flow records on one or more fields

SYNOPSIS
	 rwsort --fields=KEY [--presorted-input] [--reverse]
	       [--temp-directory=DIR_PATH] [--sort-buffer-size=SIZE]
	       [--note-add=TEXT] [--note-file-add=FILE]
	       [--compression-method=COMP_METHOD] [--print-filenames]
	       [--output-path=PATH] [--site-config-file=FILENAME]
	       [--plugin=PLUGIN [--plugin=PLUGIN ...]]
	       [--python-file=PATH [--python-file=PATH ...]]
	       [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
	       {[--input-pipe=PATH] | [--xargs]|[--xargs=FILE] | [FILES...]}

	 rwsort [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
	       [--plugin=PLUGIN ...] [--python-file=PATH ...] --help

	 rwsort [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
	       [--plugin=PLUGIN ...] [--python-file=PATH ...] --help-fields

	 rwsort --version

DESCRIPTION
       rwsort reads SiLK Flow records, sorts the records by the field(s)
       listed in the --fields switch, and writes the records to the
       --output-path or to the standard output if it is not connected to a
       terminal.  The output from rwsort is binary SiLK Flow records; the
       output must be passed into another tool for human-readable output.

       Sorting records is an expensive operation, and it should only be used
       when necessary.	The tools that bin flow records (rwcount(1),
       rwuniq(1), rwstats(1), etc) do not require sorted data.

       rwsort reads SiLK Flow records from the files named on the command line
       or from the standard input when no file names are specified and neither
       --xargs nor --input-pipe is present.  To read the standard input in
       addition to the named files, use "-" or "stdin" as a file name.	If an
       input file name ends in ".gz", the file will be uncompressed as it is
       read.  When the --xargs switch is provided, rwsort will read the names
       of the files to process from the named text file, or from the standard
       input if no file name argument is provided to the switch.  The input to
       --xargs must contain one file name per line.  The --input-pipe switch
       is deprecated and it is provided for legacy reasons; its use is not
       required since rwsort will automatically read form the standard input.
       The --input-pipe switch will be removed in the SiLK 4.0 release.

       The amount of fast memory used by rwsort will increase until it reaches
       a maximum near 2GB.  (Use the --sort-buffer-size switch to change this
       upper limit on the buffer size.)	 If more records are read than will
       fit into memory, the in-core records are sorted and temporarily stored
       on disk as described by the --temp-directory switch.  When all records
       have been read, the on-disk files are merged and the sorted records
       written to the output.

       By default, the temporary files are stored in the /tmp directory.
       Because these temporary files will be large, it is strongly recommended
       that /tmp not be used as the temporary directory.  To modify the
       temporary directory used by rwsort, provide the --temp-directory
       switch, set the SILK_TMPDIR environment variable, or set the TMPDIR
       environment variable.

       To merge previously sorted SiLK data files into a sorted stream, run
       rwsort with the --presorted-input switch.  rwsort will merge-sort all
       the input files, reducing it's memory requirements considerably.	 It is
       the user's responsibility to ensure that all the input files have been
       sorted with the same --fields value (and --reverse if applicable).
       rwsort may still require use of a temporary directory while merging the
       files (for example, if rwsort does not have enough available file
       handles to open all the input files at once).

OPTIONS
       Option names may be abbreviated if the abbreviation is unique or is an
       exact match for an option.  A parameter to an option may be specified
       as --arg=param or --arg param, though the first form is required for
       options that take optional parameters.

       The --fields switch is required.	 rwsort will fail when it is not
       provided.

       --fields=KEY
	   KEY contains the list of flow attributes (a.k.a. fields or columns)
	   that make up the key by which flows are sorted.  The fields are in
	   listed in order from primary sort key, secondary key, etc.  Each
	   field may be specified once only.  KEY is a comma separated list of
	   field-names, field-integers, and ranges of field-integers; a range
	   is specified by separating the start and end of the range with a
	   hyphen (-).	Field-names are case insensitive.  Example:

	    --fields=stime,10,1-5

	   There is no default value for the --fields switch; the switch must
	   be specified.

	   The complete list of built-in fields that the SiLK tool suite
	   supports follows, though note that not all fields are present in
	   all SiLK file formats; when a field is not present, its value is 0.

	   sIP,1
	       source IP address

	   dIP,2
	       destination IP address

	   sPort,3
	       source port for TCP and UDP, or equivalent

	   dPort,4
	       destination port for TCP and UDP, or equivalent.	 See note at
	       "iType".

	   protocol,5
	       IP protocol

	   packets,pkts,6
	       packet count

	   bytes,7
	       byte count

	   flags,8
	       bit-wise OR of TCP flags over all packets

	   sTime,9,sTime+msec,22
	       starting time of flow (milliseconds resolution)

	   duration,10,dur+msec,24
	       duration of flow (milliseconds resolution)

	   eTime,11,eTime+msec,23
	       end time of flow (milliseconds resolution)

	   sensor,12
	       name or ID of sensor where flow was collected

	   class,20,type,21
	       integer value of the class/type pair assigned to the flow by
	       rwflowpack(8)

	   iType
	       the ICMP type value for ICMP or ICMPv6 flows and zero for non-
	       ICMP flows.  Internally, SiLK stores the ICMP type and code in
	       the "dPort" field, so there is no need have both "dPort" and
	       "iType" or "iCode" in the sort key.  This field was introduced
	       in SiLK 3.8.1.

	   iCode
	       the ICMP code value for ICMP or ICMPv6 flows and zero for non-
	       ICMP flows.  See note at "iType".

	   icmpTypeCode,25
	       equivalent to "iType","iCode".  This field may not be mixed
	       with "iType" or "iCode", and this field is deprecated as of
	       SiLK 3.8.1.  Prior to SiLK 3.8.1, specifying the "icmpTypeCode"
	       field was equivalent to specifying the "dPort" field.

	   Many SiLK file formats do not store the following fields and their
	   values will always be 0; they are listed here for completeness:

	   in,13
	       router SNMP input interface or vlanId if packing tools were
	       configured to capture it (see sensor.conf(5))

	   out,14
	       router SNMP output interface or postVlanId

	   nhIP,15
	       router next hop IP

	   SiLK can store flows generated by enhanced collection software that
	   provides more information than NetFlow v5.  These flows may support
	   some or all of these additional fields; for flows without this
	   additional information, the field's value is always 0.

	   initialFlags,26
	       TCP flags on first packet in the flow

	   sessionFlags,27
	       bit-wise OR of TCP flags over all packets except the first in
	       the flow

	   attributes,28
	       flow attributes set by the flow generator:

	       "S" all the packets in this flow record are exactly the same
		   size

	       "F" flow generator saw additional packets in this flow
		   following a packet with a FIN flag (excluding ACK packets)

	       "T" flow generator prematurely created a record for a long-
		   running connection due to a timeout.	 (When the flow
		   generator yaf(1) is run with the --silk switch, it will
		   prematurely create a flow and mark it with "T" if the byte
		   count of the flow cannot be stored in a 32-bit value.)

	       "C" flow generator created this flow as a continuation of long-
		   running connection, where the previous flow for this
		   connection met a timeout (or a byte threshold in the case
		   of yaf).

	       Consider a long-running ssh session that exceeds the flow
	       generator's active timeout.  (This is the active timeout since
	       the flow generator creates a flow for a connection that still
	       has activity).  The flow generator will create multiple flow
	       records for this ssh session, each spanning some portion of the
	       total session.  The first flow record will be marked with a "T"
	       indicating that it hit the timeout.  The second through next-
	       to-last records will be marked with "TC" indicating that this
	       flow both timed out and is a continuation of a flow that timed
	       out.  The final flow will be marked with a "C", indicating that
	       it was created as a continuation of an active flow.

	   application,29
	       guess as to the content of the flow.  Some software that
	       generates flow records from packet data, such as yaf, will
	       inspect the contents of the packets that make up a flow and use
	       traffic signatures to label the content of the flow.  SiLK
	       calls this label the application; yaf refers to it as the
	       appLabel.  The application is the port number that is
	       traditionally used for that type of traffic (see the
	       /etc/services file on most UNIX systems).  For example, traffic
	       that the flow generator recognizes as FTP will have a value of
	       21, even if that traffic is being routed through the standard
	       HTTP/web port (80).

	   The following fields provide a way to label the IPs or ports on a
	   record.  These fields require external files to provide the mapping
	   from the IP or port to the label:

	   sType,16
	       categorize the source IP address as "non-routable", "internal",
	       or "external" and sort based on the category.  Uses the mapping
	       file specified by the SILK_ADDRESS_TYPES environment variable,
	       or the address_types.pmap mapping file, as described in
	       addrtype(3).

	   dType,17
	       as sType for the destination IP address

	   scc,18
	       the country code of the source IP address.  Uses the mapping
	       file specified by the SILK_COUNTRY_CODES environment variable,
	       or the country_codes.pmap mapping file, as described in
	       ccfilter(3).

	   dcc,19
	       as scc for the destination IP

	   src-MAPNAME
	       value determined by passing the source IP or the
	       protocol/source-port to the user-defined mapping defined in the
	       prefix map associated with MAPNAME.  See the description of the
	       --pmap-file switch below and the pmapfilter(3) manual page.

	   dst-MAPNAME
	       as src-MAPNAME for the destination IP or
	       protocol/destination-port.

	   sval
	   dval
	       These are deprecated field names created by pmapfilter that
	       correspond to src-MAPNAME and dst-MAPNAME, respectively.	 These
	       fields are available when a prefix map is used that is not
	       associated with a MAPNAME.

	   Finally, the list of built-in fields may be augmented by the run-
	   time loading of PySiLK code or plug-ins written in C (also called
	   shared object files or dynamic libraries), as described by the
	   --python-file and --plugin switches.

       --presorted-input
	   Instruct rwsort to merge-sort the input files; that is, rwsort
	   assumes the input files have been previously sorted using the same
	   values for the --fields and --reverse switches as was given for
	   this invocation.  This switch can greatly reduce rwsort's memory
	   requirements as a large buffer is not required for sorting the
	   records.  If the input files were created with rwsort, you can run
	   rwfileinfo(1) on the files to see the rwsort invocation that
	   created them.

       --reverse
	   Cause rwsort to reverse the sort order, causing larger values to
	   occur in the output before smaller values.  Normally smaller values
	   appear before larger values.

       --plugin=PLUGIN
	   Augment the list of fields by using run-time loading of the plug-in
	   (shared object) whose path is PLUGIN.  The switch may be repeated
	   to load multiple plug-ins.  The creation of plug-ins is described
	   in the silk-plugin(3) manual page.  When PLUGIN does not contain a
	   slash ("/"), rwsort will attempt to find a file named PLUGIN in the
	   directories listed in the "FILES" section.  If rwsort finds the
	   file, it uses that path.  If PLUGIN contains a slash or if rwsort
	   does not find the file, rwsort relies on your operating system's
	   dlopen(3) call to find the file.  When the SILK_PLUGIN_DEBUG
	   environment variable is non-empty, rwsort prints status messages to
	   the standard error as it attempts to find and open each of its
	   plug-ins.

       --temp-directory=DIR_PATH
	   Specify the name of the directory in which to store data files
	   temporarily when more records have been read that will fit into
	   RAM.	 This switch overrides the directory specified in the
	   SILK_TMPDIR environment variable, which overrides the directory
	   specified in the TMPDIR variable, which overrides the default,
	   /tmp.

       --sort-buffer-size=SIZE
	   Set the maximum size of the buffer used for sorting the records, in
	   bytes.  A larger buffer means fewer temporary files need to be
	   created, reducing the I/O wait times.  When this switch is not
	   specified, the default maximum for this buffer is near 2GB.	The
	   SIZE may be given as an ordinary integer, or as a real number
	   followed by a suffix "K", "M" or "G", which represents the
	   numerical value multiplied by 1,024 (kilo), 1,048,576 (mega), and
	   1,073,741,824 (giga), respectively.	For example, 1.5K represents
	   1,536 bytes, or one and one-half kilobytes.	(This value does not
	   represent the absolute maximum amount of RAM that rwsort will
	   allocate, since additional buffers will be allocated for reading
	   the input and writing the output.)  The sort buffer is not used
	   when the --presorted-input switch is specified.

       --note-add=TEXT
	   Add the specified TEXT to the header of the output file as an
	   annotation.	This switch may be repeated to add multiple
	   annotations to a file.  To view the annotations, use the
	   rwfileinfo(1) tool.

       --note-file-add=FILENAME
	   Open FILENAME and add the contents of that file to the header of
	   the output file as an annotation.	This switch may be repeated to
	   add multiple annotations.  Currently the application makes no
	   effort to ensure that FILENAME contains text; be careful that you
	   do not attempt to add a SiLK data file as an annotation.

       --compression-method=COMP_METHOD
	   Specify how to compress the output.	When this switch is not given,
	   output to the standard output or to named pipes is not compressed,
	   and output to files is compressed using the default chosen when
	   SiLK was compiled.  The valid values for COMP_METHOD are determined
	   by which external libraries were found when SiLK was compiled.  To
	   see the available compression methods and the default method, use
	   the --help or --version switch.  SiLK can support the following
	   COMP_METHOD values when the required libraries are available.

	   none
	       Do not compress the output using an external library.

	   zlib
	       Use the zlib(3) library for compressing the output, and always
	       compress the output regardless of the destination.  Using zlib
	       produces the smallest output files at the cost of speed.

	   lzo1x
	       Use the lzo1x algorithm from the LZO real time compression
	       library for compression, and always compress the output
	       regardless of the destination.  This compression provides good
	       compression with less memory and CPU overhead.

	   best
	       Use lzo1x if available, otherwise use zlib.  Only compress the
	       output when writing to a file.

       --print-filenames
	   Print to the standard error the names of input files as they are
	   opened.

       --output-path=PATH
	   Write the sorted SiLK Flow records to the file at PATH.  This
	   switch must not name an existing regular file.  When the standard
	   output is not a terminal and this switch is not provided or its
	   argument is "stdout", the sorted records are written to the
	   standard output.

       --site-config-file=FILENAME
	   Read the SiLK site configuration from the named file FILENAME.
	   When this switch is not provided, rwsort searches for the site
	   configuration file in the locations specified in the "FILES"
	   section.

       --input-pipe=PATH
	   Read the SiLK Flow records to be sorted from the named pipe at
	   PATH.  If PATH is "stdin" or "-", records are read from the
	   standard input.  Use of this switch is not required, since rwsort
	   will automatically read data from the standard input when no file
	   names are specified on the command line.  This switch is deprecated
	   and will be removed in the SiLK 4.0 release.

       --xargs
       --xargs=FILENAME
	   Causes rwsort to read file names from FILENAME or from the standard
	   input if FILENAME is not provided.  The input should have one file
	   name per line.  rwsort will open each file in turn and read records
	   from it, as if the files had been listed on the command line.

       --help
	   Print the available options and exit.  Specifying switches that add
	   new fields or additional switches before --help will allow the
	   output to include descriptions of those fields or switches.

       --help-fields
	   Print the description and alias(es) of each field and exit.
	   Specifying switches that add new fields before --help-fields will
	   allow the output to include descriptions of those fields.

       --version
	   Print the version number and information about how SiLK was
	   configured, then exit the application.

       --pmap-file=MAPNAME:PATH
       --pmap-file=PATH
	   Instruct rwsort to load the mapping file located at PATH and create
	   the src-MAPNAME and dst-MAPNAME fields.  When MAPNAME is provided
	   explicitly, it will be used to refer to the fields specific to that
	   prefix map.	If MAPNAME is not provided, rwsort will check the
	   prefix map file to see if a map-name was specified when the file
	   was created.	 If no map-name is available, rwsort creates the
	   fields sval and dval.  Multiple --pmap-file switches are supported
	   as long as each uses a unique value for map-name.  The --pmap-file
	   switch(es) must precede the --fields switch.	 For more information,
	   see pmapfilter(3).

       --python-file=PATH
	   When the SiLK Python plug-in is used, rwsort reads the Python code
	   from the file PATH to define additional fields that can be used as
	   part of the sort key.  This file should call register_field() for
	   each field it wishes to define.  For details and examples, see the
	   silkpython(3) and pysilk(3) manual pages.

LIMITATIONS
       When the temporary files and the final output are stored on the same
       file volume, rwsort will require approximately twice as much free disk
       space as the size of data to be sorted.

       When the temporary files and the final output are on different volumes,
       rwsort will require between 1 and 1.5 times as much free space on the
       temporary volume as the size of the data to be sorted.

EXAMPLES
       In the following examples, the dollar sign ("$") represents the shell
       prompt.	The text after the dollar sign represents the command line.

       To sort the records in infile.rw based primarily on destination port
       and secondarily on source IP and write the binary output to outfile.rw,
       run:

	$ rwsort --fields=dport,sip --output-path=outfile.rw infile.rw

       The silkpython(3) manual page provides examples that use PySiLK to
       create arbitrary fields to use as part of the key for rwsort.

ENVIRONMENT
       SILK_TMPDIR
	   When set and --temp-directory is not specified, rwsort writes the
	   temporary files it creates to this directory.  SILK_TMPDIR
	   overrides the value of TMPDIR.

       TMPDIR
	   When set and SILK_TMPDIR is not set, rwsort writes the temporary
	   files it creates to this directory.

       PYTHONPATH
	   This environment variable is used by Python to locate modules.
	   When --python-file is specified, rwsort must load the Python files
	   that comprise the PySiLK package, such as silk/__init__.py.	If
	   this silk/ directory is located outside Python's normal search path
	   (for example, in the SiLK installation tree), it may be necessary
	   to set or modify the PYTHONPATH environment variable to include the
	   parent directory of silk/ so that Python can find the PySiLK
	   module.

       SILK_PYTHON_TRACEBACK
	   When set, Python plug-ins will output traceback information on
	   Python errors to the standard error.

       SILK_COUNTRY_CODES
	   This environment variable allows the user to specify the country
	   code mapping file that rwsort uses when computing the scc and dcc
	   fields.  The value may be a complete path or a file relative to the
	   SILK_PATH.  See the "FILES" section for standard locations of this
	   file.

       SILK_ADDRESS_TYPES
	   This environment variable allows the user to specify the address
	   type mapping file that rwsort uses when computing the sType and
	   dType fields.  The value may be a complete path or a file relative
	   to the SILK_PATH.  See the "FILES" section for standard locations
	   of this file.

       SILK_CLOBBER
	   The SiLK tools normally refuse to overwrite existing files.
	   Setting SILK_CLOBBER to a non-empty value removes this restriction.

       SILK_CONFIG_FILE
	   This environment variable is used as the value for the
	   --site-config-file when that switch is not provided.

       SILK_DATA_ROOTDIR
	   This environment variable specifies the root directory of data
	   repository.	As described in the "FILES" section, rwsort may use
	   this environment variable when searching for the SiLK site
	   configuration file.

       SILK_PATH
	   This environment variable gives the root of the install tree.  When
	   searching for configuration files and plug-ins, rwsort may use this
	   environment variable.  See the "FILES" section for details.

       SILK_PLUGIN_DEBUG
	   When set to 1, rwsort prints status messages to the standard error
	   as it attempts to find and open each of its plug-ins.  In addition,
	   when an attempt to register a field fails, the application prints a
	   message specifying the additional function(s) that must be defined
	   to register the field in the application.  Be aware that the output
	   can be rather verbose.

       SILK_TEMPFILE_DEBUG
	   When set to 1, rwsort prints debugging messages to the standard
	   error as it creates, re-opens, and removes temporary files.

FILES
       ${SILK_ADDRESS_TYPES}
       ${SILK_PATH}/share/silk/address_types.pmap
       ${SILK_PATH}/share/address_types.pmap
       /usr/local/share/silk/address_types.pmap
       /usr/local/share/address_types.pmap
	   Possible locations for the address types mapping file required by
	   the sType and dType fields.

       ${SILK_CONFIG_FILE}
       ${SILK_DATA_ROOTDIR}/silk.conf
       /data/silk.conf
       ${SILK_PATH}/share/silk/silk.conf
       ${SILK_PATH}/share/silk.conf
       /usr/local/share/silk/silk.conf
       /usr/local/share/silk.conf
	   Possible locations for the SiLK site configuration file which are
	   checked when the --site-config-file switch is not provided.

       ${SILK_COUNTRY_CODES}
       ${SILK_PATH}/share/silk/country_codes.pmap
       ${SILK_PATH}/share/country_codes.pmap
       /usr/local/share/silk/country_codes.pmap
       /usr/local/share/country_codes.pmap
	   Possible locations for the country code mapping file required by
	   the scc and dcc fields.

       ${SILK_PATH}/lib64/silk/
       ${SILK_PATH}/lib64/
       ${SILK_PATH}/lib/silk/
       ${SILK_PATH}/lib/
       /usr/local/lib64/silk/
       /usr/local/lib64/
       /usr/local/lib/silk/
       /usr/local/lib/
	   Directories that rwsort checks when attempting to load a plug-in.

       ${SILK_TMPDIR}/
       ${TMPDIR}/
       /tmp/
	   Directory in which to create temporary files.

SEE ALSO
       rwcut(1), rwfileinfo(1), rwstats(1), rwuniq(1), addrtype(3),
       ccfilter(3), pmapfilter(3), pysilk(3), silkpython(3), silk-plugin(3),
       sensor.conf(5), rwflowpack(8), silk(7), yaf(1), dlopen(3), zlib(3)

NOTES
       If an output path is not specified, rwsort will write to the standard
       output unless it is connected to a terminal, in which case an error is
       printed and rwsort exits.

       If an input pipe or a set of input files are not specified, rwsort will
       read records from the standard input unless it is connected to a
       terminal, in which case an error is printed and rwsort exits.

       Note that rwsort produces binary output.	 Use rwcut(1) to view the
       records.

       Do not spend the resources to sort the data if you are going to be
       passing it to an aggregation tool like rwtotal or rwaddrcount, which
       have their own internal data structures that will ignore the sorted
       data.

       Both rwuniq(1) and rwstats(1) can take advantage of previously sorted
       data, but you must explicitly inform them that the input is sorted by
       providing the --presorted-input switch.

SiLK 3.11.0.1			  2016-02-19			     rwsort(1)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net