hxpipe man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

HXPIPE(1)			HTML-XML-utils			     HXPIPE(1)

NAME
       hxpipe - convert XML file to a format easier to parse with Perl or AWK

SYNOPSIS
       hxpipe [ -l ] [ -- ] [ file-or-URL ]

DESCRIPTION
       hxpipe parses an HTML or XML file and outputs a line-oriented represen‐
       tation of it that is well suited to further processing with AWK or sim‐
       ilar tools. The format is similar to the ESIS (Element Structure Infor‐
       mation Set) that is output by nsgmls/onsgmls.

       The reverse operation, converting back to mark-up, is performed by  the
       hxunpipe program.

       The output format is as follows:

       <!--comment-->
		 Comments are output as

		     *comment

		 I.e., a single line starting with "*" followed by the text of
		 the comment. Line feeds, carriage returns  and	 tabs  in  the
		 text  are  written as "\n", "\r" and "\t", respectively. Text
		 that looks like a numerical character entity is written  with
		 the "&" replaced by "\".  The line ends with a line feed.

		 Note  that  onsgmls  outputs  comments	 starting  with	 a "_"
		 instead of a "*" and doesn't replace  the  "&"	 of  numerical
		 character  entities  by "\" (and by default it omits comments
		 altogether).

       <?processing instruction>
		 Processing instructions are output as

		     ?processing instruction

		 I.e., a single line starting with a "?" followed by the  text
		 of  the  processing  instruction.  The text is escaped as for
		 comments (see above).

       <!DOCTYPE root PUBLIC "-//foo//DTD bar//EN" "http://example.org/dtd">
		 DOCTYPEs are output as one of the following:

		     !root "-//foo//DTD bar//EN" http://example.org/dtd
		     !root "-//foo//DTD bar//EN"
		     !root "" http://example.org/dtd
		     !root ""

		 for respectively: a DOCTYPE with (1) both a public and a sys‐
		 tem identifier, (2) only a public identifier, (3) only a sys‐
		 tem identifier, or (4) neither of the	two.  I.e.,  a	single
		 line  starting with a "!", followed by a space and a possibly
		 empty quoted string, followed optionally by a space and arbi‐
		 trary text. Note the quotes for the public identifier and the
		 absence of quotes for the system identifier.

       <elt att1="value1" att2="value2">
		 A start tag is output as

		     Aatt1 CDATA value1
		     Aatt2 CDATA value2
		     (elt

		 I.e., as zero or more lines for the attributes and  one  line
		 for  the element type. Each line for an attribute starts with
		 "A" followed by the name of the attribute, a space, the  lit‐
		 eral  string "CDATA", another space, and the attribute value.
		 The text of the attribute value is escaped  as	 for  comments
		 (see  above).	The  line for the element type starts with "("
		 followed by the element type.

		 hxpipe does not read DTDs and	assumes	 that  attributes  are
		 always CDATA. It never generates other types (IMPLIED, TOKEN,
		 ID, etc.), unlike onsgmls.

       </elt>	 End tags are output as

		     )elt

		 I.e., as a line starting with ")"  followed  by  the  element
		 type.

       <empty att1="val1" att2="val2"/>
		 Empty elements (in XML) are output as

		     Aatt1 CDATA val1
		     Aatt2 CDATA val2
		     |empty

		 I.e.,	as  zero  or  more  lines  for attributes and one line
		 starting with "|" followed by the element type.

		 Note that onsgmls never outputs "|". (However, it can option‐
		 ally output a line consisting of a single "e" just before the
		 "(" line, to indicate that the element is empty.)

       text	 Text is output as

		     -text

		 I.e., as a single line starting  with	a  "-".	 The  text  is
		 escaped as for comments (see above).

       line numbers
		 When  the -l option is in effect, hxpipe will intersperse the
		 output with lines of the form

		     L12

		 where "12" is replaced with the line  number  in  the	source
		 where the next output came from.

       hxpipe does not normalize the input and does not add mising tags. It is
       thus possible that there are unequal numbers of "(" and ")"  lines.  If
       it is important that every start tag is matched by an end tag, pipe the
       input through hxnormalize -x first.

OPTIONS
       The following options are supported:

       -l	 Add "L" lines to the output to indicate the line  numbers  in
		 the source.

OPERANDS
       The following operand is supported:

       file-or-URL
		 The name or URL of an HTML file. If absent, standard input is
		 read instead.

EXIT STATUS
       The following exit values are returned:

       0	 Successful completion.

       > 0	 An error occurred in the parsing of the  HTML	file.	hxpipe
		 will try to correct the error and produce output anyway.

ENVIRONMENT
       To  use a proxy to retrieve remote files, set the environment variables
       http_proxy and ftp_proxy.  E.g., http_proxy="http://localhost:8080/"

BUGS
       The error recovery for incorrect HTML is	 primitive.   hxnormalize  can
       currently only retrieve remote files over HTTP. It doesn't handle pass‐
       word-protected files, nor files whose content depends  on  HTTP	"cook‐
       ies."

SEE ALSO
       hxunpipe(1), onsgmls(1).

6.x				  10 Jul 2011			     HXPIPE(1)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net