pcre man page on Aros

pcre man page on Aros
Man page or keyword search:
man Server 366 pages
apropos Keyword Search (all sections)
Output format
PCRE(3)								       PCRE(3)

NAME
       PCRE - Perl-compatible regular expressions

DESCRIPTION

       The  PCRE  library is a set of functions that implement regular expres‐
       sion pattern matching using the same syntax and semantics as Perl, with
       just  a	few  differences.  The current implementation of PCRE (release
       4.x) corresponds approximately with Perl	 5.8,  including  support  for
       UTF-8  encoded  strings.	  However,  this  support has to be explicitly
       enabled; it is not the default.

       PCRE is written in C and released as a C library. However, a number  of
       people  have  written  wrappers	and interfaces of various kinds. A C++
       class is included in these contributions, which can  be	found  in  the
       Contrib directory at the primary FTP site, which is:

       ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre

       Details	of  exactly which Perl regular expression features are and are
       not supported by PCRE are given in separate documents. See the pcrepat‐
       tern and pcrecompat pages.

       Some  features  of  PCRE can be included, excluded, or changed when the
       library is built. The pcre_config() function makes it  possible	for  a
       client  to  discover  which features are available. Documentation about
       building PCRE for various operating systems can be found in the	README
       file in the source distribution.

USER DOCUMENTATION

       The user documentation for PCRE has been split up into a number of dif‐
       ferent sections. In the "man" format, each of these is a separate  "man
       page".  In  the	HTML  format, each is a separate page, linked from the
       index page. In the plain text format, all  the  sections	 are  concate‐
       nated, for ease of searching. The sections are as follows:

	 pcre		   this document
	 pcreapi	   details of PCRE's native API
	 pcrebuild	   options for building PCRE
	 pcrecallout	   details of the callout feature
	 pcrecompat	   discussion of Perl compatibility
	 pcregrep	   description of the pcregrep command
	 pcrepattern	   syntax and semantics of supported
			     regular expressions
	 pcreperform	   discussion of performance issues
	 pcreposix	   the POSIX-compatible API
	 pcresample	   discussion of the sample program
	 pcretest	   the pcretest testing command

       In  addition,  in the "man" and HTML formats, there is a short page for
       each library function, listing its arguments and results.

LIMITATIONS

       There are some size limitations in PCRE but it is hoped that they  will
       never in practice be relevant.

       The  maximum  length of a compiled pattern is 65539 (sic) bytes if PCRE
       is compiled with the default internal linkage size of 2. If you want to
       process	regular	 expressions  that are truly enormous, you can compile
       PCRE with an internal linkage size of 3 or 4 (see the  README  file  in
       the  source  distribution and the pcrebuild documentation for details).
       If these cases the limit is substantially larger.  However,  the	 speed
       of execution will be slower.

       All values in repeating quantifiers must be less than 65536.  The maxi‐
       mum number of capturing subpatterns is 65535.

       There is no limit to the number of non-capturing subpatterns,  but  the
       maximum	depth  of  nesting  of	all kinds of parenthesized subpattern,
       including capturing subpatterns, assertions, and other types of subpat‐
       tern, is 200.

       The  maximum  length of a subject string is the largest positive number
       that an integer variable can hold. However, PCRE uses recursion to han‐
       dle  subpatterns	 and indefinite repetition. This means that the avail‐
       able stack space may limit the size of a subject	 string	 that  can  be
       processed by certain patterns.

UTF-8 SUPPORT

       Starting	 at  release  3.3,  PCRE  has  had  some support for character
       strings encoded in the UTF-8 format. For	 release  4.0  this  has  been
       greatly extended to cover most common requirements.

       In  order  process  UTF-8 strings, you must build PCRE to include UTF-8
       support in the code, and, in addition,  you  must  call	pcre_compile()
       with  the PCRE_UTF8 option flag. When you do this, both the pattern and
       any subject strings that are matched against it are  treated  as	 UTF-8
       strings instead of just strings of bytes.

       If  you compile PCRE with UTF-8 support, but do not use it at run time,
       the library will be a bit bigger, but the additional run time  overhead
       is  limited  to testing the PCRE_UTF8 flag in several places, so should
       not be very large.

       The following comments apply when PCRE is running in UTF-8 mode:

       1. When you set the PCRE_UTF8 flag, the strings passed as patterns  and
       subjects	 are  checked for validity on entry to the relevant functions.
       If an invalid UTF-8 string is passed, an error return is given. In some
       situations,  you	 may  already  know  that  your strings are valid, and
       therefore want to skip these checks in order to improve performance. If
       you  set	 the  PCRE_NO_UTF8_CHECK  flag at compile time or at run time,
       PCRE assumes that the pattern or subject	 it  is	 given	(respectively)
       contains	 only valid UTF-8 codes. In this case, it does not diagnose an
       invalid UTF-8 string. If you pass an invalid UTF-8 string to PCRE  when
       PCRE_NO_UTF8_CHECK  is set, the results are undefined. Your program may
       crash.

       2. In a pattern, the escape sequence \x{...}, where the contents of the
       braces  is  a  string  of hexadecimal digits, is interpreted as a UTF-8
       character whose code number is the given hexadecimal number, for	 exam‐
       ple:  \x{1234}.	If a non-hexadecimal digit appears between the braces,
       the item is not recognized.  This escape sequence can be used either as
       a literal, or within a character class.

       3.  The	original hexadecimal escape sequence, \xhh, matches a two-byte
       UTF-8 character if the value is greater than 127.

       4. Repeat quantifiers apply to complete UTF-8 characters, not to	 indi‐
       vidual bytes, for example: \x{100}{3}.

       5.  The dot metacharacter matches one UTF-8 character instead of a sin‐
       gle byte.

       6. The escape sequence \C can be used to match a single byte  in	 UTF-8
       mode, but its use can lead to some strange effects.

       7.  The	character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly
       test characters of any code value, but the characters that PCRE	recog‐
       nizes  as  digits,  spaces,  or	word characters remain the same set as
       before, all with values less than 256.

       8. Case-insensitive matching applies only to  characters	 whose	values
       are  less  than	256.  PCRE  does  not support the notion of "case" for
       higher-valued characters.

       9. PCRE does not support the use of Unicode tables  and	properties  or
       the Perl escapes \p, \P, and \X.

AUTHOR

       Philip Hazel <ph10@cam.ac.uk>
       University Computing Service,
       Cambridge CB2 3QG, England.
       Phone: +44 1223 334714

Last updated: 20 August 2003
Copyright (c) 1997-2003 University of Cambridge.

								       PCRE(3)
[top]

List of man pages available for Aros

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome