Parse::Token man page on Fedora

Parse::Token man page on Fedora
Printed from http://www.polarhome.com/service/man/?qf=Parse%3A%3AToken&af=0&tf=2&of=Fedora
Parse::Token(3)	      User Contributed Perl Documentation      Parse::Token(3)

NAME
       "Parse::Token" - Definition of tokens used by "Parse::Lex"

SYNOPSIS
	       require 5.005;

	       use Parse::Lex;
	       @token = qw(
		   ADDOP    [-+]
		   INTEGER  [1-9][0-9]*
		  );

	       $lexer = Parse::Lex->new(@token);
	       $lexer->from(\*DATA);

	       $content = $INTEGER->next;
	       if ($INTEGER->status) {
		 print "$content\n";
	       }
	       $content = $ADDOP->next;
	       if ($ADDOP->status) {
		 print "$content\n";
	       }
	       if ($INTEGER->isnext(\$content)) {
		 print "$content\n";
	       }
	       __END__
	       1+2

DESCRIPTION
       The "Parse::Token" class and its derived classes permit defining the
       tokens used by "Parse::Lex" or "Parse::LexEvent".

       The creation of tokens can be done by means of the "new()" or
       "factory()" methods.  The "Lex::new()" method of the "Parse::Lex"
       package indirectly creates instances of the tokens to be recognized.

       The "next()" or "isnext()" methods of the "Parse::Token" package permit
       interfacing the lexical analyzer with a syntactic analyzer of recursive
       descent type.  For interfacing with "byacc", see the "Parse::YYLex"
       package.

       "Parse::Token" is included indirectly by means of "use Parse::Lex" or
       "use Parse::LexEvent".

Methods
       action
	   Returns the anonymous subroutine defined within the "Parse::Token"
	   object.

       factory LIST
       factory ARRAY_REF
	   The "factory(LIST)" method creates a list of tokens from a list of
	   specifications, which include for each token: a name, a regular
	   expression, and possibly an anonymous subroutine.  The list can
	   also include objects of class "Parse::Token" or of a class derived
	   from it.

	   The "factory(ARRAY_REF)" method permits creating tokens from
	   specifications of type attribute-value:

		   Parse::Token->factory([Type => 'Simple',
					  Name => 'EXAMPLE',
					  Regex => '.+']);

	   "Type" indicates the type of each token to be created (the package
	   prefix is not indicated).

	   "factory()" creates a series of tokens but does not import these
	   tokens into the calling package.

	   You could for example write:

		   %keywords =
		     qw (
			 PROC  undef
			 FUNC  undef
			 RETURN undef
			 IF    undef
			 ELSE  undef
			 WHILE undef
			 PRINT undef
			 READ  undef
			);
		   @tokens = Parse::Token->factory(%keywords);

	   and install these tokens in a symbol table in the following manner:

		   foreach $name (keys %keywords) {
		     ${$name} = pop @tokens;
		     $symbol{"\L$name"} = [${$name}, ''];
		   }

	   "${$name}" is the token instance.

	   During the lexical analysis phase, you can use the tokens in the
	   following manner:

		   qw(IDENT [a-zA-Z][a-zA-Z0-9_]*),  sub {
		      $symbol{$_[1]} = [] unless defined $symbol{$_[1]};
		      my $type = $symbol{$_[1]}[0];
		      $lexer->setToken((not defined $type) ? $VAR : $type);
		      $_[1];  # THE TOKEN TEXT
		    }

	   This permits indicating that any symbol of unknown type is a
	   variable.

	   In this example we have used $_[1] which corresponds to the text
	   recognized by the regular expression.  This text associated with
	   the token must be returned by the anonymous subroutine.

       get EXPR
	   "get" obtains the value of the attribute named by the result of
	   evaluating EXPR.  You can also use the name of the attribute as a
	   method name.

       getText
	   Returns the character string that was recognized by means of this
	   "Parse::Token" object.

	   Same as the text() method.

       isnext EXPR
       isnext
	   Returns the status of the token. The consumed string is put into
	   EXPR if it is a reference to a scalar.

       name
	   Returns the name of the token.

       next
	   Activate searching for the lexeme defined by the regular expression
	   contained in the object. If this lexeme is recognized on the
	   character stream to analyze, "next" returns the string found and
	   sets the status of the object to true.

       new SYMBOL_NAME, REGEXP, SUB
       new SYMBOL_NAME, REGEXP
	   Creates an object of type "Parse::Token::Simple" or
	   "Parse::Token::Segmented". The arguments of the "new()" method are,
	   respectively: a symbolic name, a regular expression, and possibly
	   an anonymous subroutine.  The subclasses of "Parse::Token" permit
	   specifying tokens by means of a list of attribute-values.

	   REGEXP is either a simple regular expression, or a reference to an
	   array containing from one to three regular expressions.  In the
	   first case, the instance belongs to the "Parse::Token::Simple"
	   class.  In the second case, the instance belongs to the
	   "Parse::Token::Segmented" class.  The tokens of this type permit
	   recognizing structures of type character string delimited by
	   quotation marks, comments in a C program, etc.  The regular
	   expressions are used to recognize:

	   1. The beginning of the lexeme,

	   2. The "body" of the lexeme; if this second expression is missing,
	   "Parse::Lex" uses "(?:.*?)",

	   3. the end of the lexeme; if this last expression is missing then
	   the first one is used. (Note! The end of the lexeme cannot span
	   several lines).

	   Example:

		     qw(STRING), [qw(" (?:[^"\\\\]+|\\\\(?:.|\n))* ")],

	   These regular expressions can recognize multi-line strings
	   delimited by quotation marks, where the backslash is used to quote
	   the quotation marks appearing within the string. Notice the
	   quadrupling of the backslash.

	   Here is a variation of the previous example which uses the "s"
	   option to include newline in the characters recognized by ""."":

		     qw(STRING), [qw(" (?s:[^"\\\\]+|\\\\.)* ")],

	   (Note: it is possible to write regular expressions which are more
	   efficient in terms of execution time, but this is not our objective
	   with this example.  See Mastering Regular Expressions.)

	   The anonymous subroutine is called when the lexeme is recognized by
	   the lexical analyzer. This subroutine takes two arguments: $_[0]
	   contains the token instance, and $_[1] contains the string
	   recognized by the regular expression. The scalar returned by the
	   anonymous subroutine defines the character string memorized in the
	   token instance.

	   In the anonymous subroutine you can use the positional variables
	   $1, $2, etc. which correspond to the groups of parentheses in the
	   regular expression.

       regexp
	   Returns the regular expression of the "Token" object.

       set LIST
	   Allows marking a token with a list of attribute-value pairs.

	   An attribute name can be used as a method name.

       setText EXPR
	   The value of "EXPR" defines the character string associated with
	   the lexeme.

	   Same as the "text(EXPR)" method.

       status EXPR
       status
	   Indicates if the last search of the lexeme succeeded or failed.
	   "status EXPR" overrides the existing value and sets it to the value
	   of EXPR.

       text EXPR
       text
	   "text()" returns the character string recognized by means of the
	   token. The value of "EXPR" sets the character string associated
	   with the lexeme.

       trace OUTPUT
       trace
	   Class method which activates/deactivates a trace of the lexical
	   analysis.

	   "OUTPUT" can be a file name or a reference to a filehandle to which
	   the trace will be directed.

Subclasses of Parse::Token
       Subclasses of the "Parse::Token" class are being defined.  They permit
       recognizing specific structures such as, for example, strings within
       double-quotes, C comments, etc.	Here are the subclasses which I am
       working on:

       "Parse::Token::Simple" : tokens of this class are defined by means of a
       single regular expression.

       "Parse::Token::Segmented" : tokens of this class are defined by means
       of three regular expressions.  Reading of new data is done
       automatically.

       "Parse::Token::Delimited" : permits recognizing, for example, C
       language comments.

       "Parse::Token::Quoted" : permits recognizing, for example, character
       strings within quotation marks.

       "Parse::Token::Nested" : permits recognizing nested structures such as
       parenthesized expressions.  NOT DEFINED.

       These classes are recently created and no doubt contain some bugs.

   Parse::Token::Action
       Tokens of the "Parse::Token::Action" class permit inserting arbitrary
       Perl expressions within a lexical analyzer.  An expression can be used
       for instance to print out internal variables of the analyzer:

       ·   $LEX_BUFFER : contents of the buffer to be analyzed

       ·   $LEX_LENGTH : length of the character string being analyzed

       ·   $LEX_RECORD : number of the record being analyzed

       ·   $LEX_OFFSET : number of characters already consumed since the start
	   of the analysis.

       ·   $LEX_POS : position reached by the analysis as a number of
	   characters since the start of the buffer.

       The class constructor accepts the following attributes:

       ·   "Name" : the name of the token

       ·   "Expr" : a Perl expression

       Example :

	       $ACTION = new Parse::Token::Action(
					     Name => 'ACTION',
					     Expr => q!print "LEX_POS: $LEX_POS\n" .
					     "LEX_BUFFER: $LEX_BUFFER\n" .
					     "LEX_LENGTH: $LEX_LENGTH\n" .
					     "LEX_RECORD: $LEX_RECORD\n" .
					     "LEX_OFFSET: $LEX_OFFSET\n"
					     ;!,
					    );

   Parse::Token::Simple
       The class constructor accepts the following attributes:

       ·   "Handler" : the value indicates the name of a function to call
	   during an analysis performed by an analyzer of class
	   "Parse::LexEvent".

       ·   "Name" : the associated value is the name of the token.

       ·   "Regex" : the associated value is a regular expression
	   corresponding to the pattern to be recognized.

       ·   "ReadMore" : if the associated value is 1, the recognition of the
	   token continues after reading a new record.	The strings recognized
	   are concatenated.  This attribute only has effect during analysis
	   of a character stream.

       ·   "Sub" : the associated value must be an anonymous subroutine to be
	   executed after the token is recognized.  This function is only used
	   with analyzers of class "Parse::Lex" or "Parse::CLex".

       Example.
	     new Parse::Token::Simple(Name => 'remainder',
				      Regex => '[^/\'\"]+',
				      ReadMore => 1);

   Parse::Token::Segmented
       The definition of these tokens includes three regular expressions.
       During analysis of a data stream, new data is read as long as the end
       of the token has not been reached.

       The class constructor accepts the following attributes:

       ·   "Handler" : the value indicates the name of a function to call
	   during analysis performed by an analyzer of class
	   "Parse::LexEvent".

       ·   "Name" : the associated value is the name of the token.

       ·   "Regex" : the associated value must be a reference to an array that
	   contains three regular expressions.

       ·   "Sub" : the associated value must be an anonymous subroutine to be
	   executed after the token is recognized.  This function is only used
	   with analyzers of class "Parse::Lex" or "Parse::CLex".

   Parse::Token::Quoted
       "Parse::Token::Quoted" is a subclass of "Parse::Token::Segmented".  It
       permits recognizing character strings within double quotes or single
       quotes.

       Examples.

	     ---------------------------------------------------------
	      Start    End	      Escaping
	     ---------------------------------------------------------
	       '	'	       ''
	       "	"	       ""
	       "	"	       \
	     ---------------------------------------------------------

       The class constructor accepts the following attributes:

       ·   "End" : The associated value is a regular expression permitting
	   recognizing the end of the token.

       ·   "Escape" : The associated value indicates the character used to
	   escape the delimiter.  By default, a double occurrence of the
	   terminating character escapes that character.

       ·   "Handler" : the value indicates the name of a function to be called
	   during an analysis performed by an analyzer of class
	   "Parse::LexEvent".

       ·   "Name" : the associated value is the name of the token.

       ·   "Start" : the associated value is a regular expression permitting
	   recognizing the start of the token.

       ·   "Sub" : the associated value must be an anonymous subroutine to be
	   executed after the token is recognized.  This function is only used
	   with analyzers of class "Parse::Lex" or "Parse::CLex".

       Example.
	     new Parse::Token::Quoted(Name => 'squotes',
				      Handler => 'string',
				      Escape => '\\',
				      Quote => qq!\'!,
				     );

   Parse::Token::Delimited
       "Parse::Token::Delimited" is a subclass of "Parse::Token::Segmented".
       It permits, for example, recognizing C language comments.

       Examples.

	     ---------------------------------------------------------
	       Start   End     Constraint
			       on the contents
	     ---------------------------------------------------------
	       /*	*/			   C Comment
	       <!--	-->	 No '--'	   XML Comment
	       <!--	-->			   SGML Comment
	       <?	?>			   Processing instruction
						   in SGML/XML
	     ---------------------------------------------------------

       The class constructor accepts the following attributes:

       ·   "End" : The associated value is a regular expression permitting
	   recognizing the end of the token.

       ·   "Handler" : the value indicates the name of a function to be called
	   during an analysis performed by an analyzer of class
	   "Parse::LexEvent".

       ·   "Name" : the associated value is the name of the token.

       ·   "Start" : the associated value is a regular expression permitting
	   recognizing the start of the token.

       ·   "Sub" : the associated value must be an anonymous subroutine to be
	   executed after the token is recognized.  This function is only used
	   with analyzers of class "Parse::Lex" or "Parse::CLex".

       Example.
	     new Parse::Token::Delimited(Name => 'comment',
					 Start => '/[*]',
					 End => '[*]/'
					);

   Parse::Token::Nested - Not defined
       Examples.

	     ----------------------------------------------------------
	       Start   End
	     ----------------------------------------------------------
	       (	)		       Symbolic Expressions
	       {	}		       Rich Text Format Groups
	     ----------------------------------------------------------

BUGS
       The implementation of subclasses of tokens is not complete for
       analyzers of the "Parse::CLex" class.  I am not too keen to do it,
       since an implementation for classes "Parse::Lex" and "Parse::LexEvent"
       seems quite sufficient.

AUTHOR
       Philippe Verdret. Documentation translated to English by Vladimir
       Alexiev and Ocrat.

ACKNOWLEDGMENTS
       Version 2.0 owes much to suggestions made by Vladimir Alexiev.  Ocrat
       has significantly contributed to improving this documentation.  Thanks
       also to the numerous persons who have made comments or sometimes sent
       bug fixes.

REFERENCES
       Friedl, J.E.F. Mastering Regular Expressions. O'Reilly & Associates
       1996.

       Mason, T. & Brown, D. - Lex & Yacc. O'Reilly & Associates, Inc. 1990.

COPYRIGHT
       Copyright (c) 1995-1999 Philippe Verdret. All rights reserved. This
       module is free software; you can redistribute it and/or modify it under
       the same terms as Perl itself.

perl v5.14.0			  2010-03-26		       Parse::Token(3)
[top]

List of man pages available for Fedora

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome