pcre man page on YellowDog

pcre man page on YellowDog
Man page or keyword search:
man Server 18644 pages
apropos Keyword Search (all sections)
Output format
PCRE(3)								       PCRE(3)

NAME
       PCRE - Perl-compatible regular expressions

INTRODUCTION

       The  PCRE  library is a set of functions that implement regular expres‐
       sion pattern matching using the same syntax and semantics as Perl, with
       just  a	few  differences.  The current implementation of PCRE (release
       6.x) corresponds approximately with Perl	 5.8,  including  support  for
       UTF-8 encoded strings and Unicode general category properties. However,
       this support has to be explicitly enabled; it is not the default.

       In addition to the Perl-compatible matching function,  PCRE  also  con‐
       tains  an  alternative matching function that matches the same compiled
       patterns in a different way. In certain circumstances, the  alternative
       function	 has  some  advantages.	 For  a discussion of the two matching
       algorithms, see the pcrematching page.

       PCRE is written in C and released as a C library. A  number  of	people
       have  written  wrappers and interfaces of various kinds. In particular,
       Google Inc.  have provided a comprehensive C++  wrapper.	 This  is  now
       included as part of the PCRE distribution. The pcrecpp page has details
       of this interface. Other people's contributions can  be	found  in  the
       Contrib directory at the primary FTP site, which is:

       ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre

       Details	of  exactly which Perl regular expression features are and are
       not supported by PCRE are given in separate documents. See the pcrepat‐
       tern and pcrecompat pages.

       Some  features  of  PCRE can be included, excluded, or changed when the
       library is built. The pcre_config() function makes it  possible	for  a
       client  to  discover  which  features are available. The features them‐
       selves are described in the pcrebuild page. Documentation about	build‐
       ing  PCRE for various operating systems can be found in the README file
       in the source distribution.

       The library contains a number of undocumented  internal	functions  and
       data  tables  that  are	used by more than one of the exported external
       functions, but which are not intended  for  use	by  external  callers.
       Their  names  all begin with "_pcre_", which hopefully will not provoke
       any name clashes. In some environments, it is possible to control which
       external	 symbols  are  exported when a shared library is built, and in
       these cases the undocumented symbols are not exported.

USER DOCUMENTATION

       The user documentation for PCRE comprises a number  of  different  sec‐
       tions.  In the "man" format, each of these is a separate "man page". In
       the HTML format, each is a separate page, linked from the  index	 page.
       In  the	plain text format, all the sections are concatenated, for ease
       of searching. The sections are as follows:

	 pcre		   this document
	 pcreapi	   details of PCRE's native C API
	 pcrebuild	   options for building PCRE
	 pcrecallout	   details of the callout feature
	 pcrecompat	   discussion of Perl compatibility
	 pcrecpp	   details of the C++ wrapper
	 pcregrep	   description of the pcregrep command
	 pcrematching	   discussion of the two matching algorithms
	 pcrepartial	   details of the partial matching facility
	 pcrepattern	   syntax and semantics of supported
			     regular expressions
	 pcreperform	   discussion of performance issues
	 pcreposix	   the POSIX-compatible C API
	 pcreprecompile	   details of saving and re-using precompiled patterns
	 pcresample	   discussion of the sample program
	 pcretest	   description of the pcretest testing command

       In addition, in the "man" and HTML formats, there is a short  page  for
       each C library function, listing its arguments and results.

LIMITATIONS

       There  are some size limitations in PCRE but it is hoped that they will
       never in practice be relevant.

       The maximum length of a compiled pattern is 65539 (sic) bytes  if  PCRE
       is compiled with the default internal linkage size of 2. If you want to
       process regular expressions that are truly enormous,  you  can  compile
       PCRE  with  an  internal linkage size of 3 or 4 (see the README file in
       the source distribution and the pcrebuild documentation	for  details).
       In  these  cases the limit is substantially larger.  However, the speed
       of execution will be slower.

       All values in repeating quantifiers must be less than 65536.  The maxi‐
       mum number of capturing subpatterns is 65535.

       There  is  no limit to the number of non-capturing subpatterns, but the
       maximum depth of nesting of  all	 kinds	of  parenthesized  subpattern,
       including capturing subpatterns, assertions, and other types of subpat‐
       tern, is 200.

       The maximum length of a subject string is the largest  positive	number
       that  an integer variable can hold. However, when using the traditional
       matching function, PCRE uses recursion to handle subpatterns and indef‐
       inite  repetition.  This means that the available stack space may limit
       the size of a subject string that can be processed by certain patterns.

UTF-8 AND UNICODE PROPERTY SUPPORT

       From release 3.3, PCRE has  had	some  support  for  character  strings
       encoded	in the UTF-8 format. For release 4.0 this was greatly extended
       to cover most common requirements, and in release 5.0  additional  sup‐
       port for Unicode general category properties was added.

       In  order  process  UTF-8 strings, you must build PCRE to include UTF-8
       support in the code, and, in addition,  you  must  call	pcre_compile()
       with  the PCRE_UTF8 option flag. When you do this, both the pattern and
       any subject strings that are matched against it are  treated  as	 UTF-8
       strings instead of just strings of bytes.

       If  you compile PCRE with UTF-8 support, but do not use it at run time,
       the library will be a bit bigger, but the additional run time  overhead
       is  limited  to testing the PCRE_UTF8 flag in several places, so should
       not be very large.

       If PCRE is built with Unicode character property support (which implies
       UTF-8  support),	 the  escape sequences \p{..}, \P{..}, and \X are sup‐
       ported.	The available properties that can be tested are limited to the
       general	category  properties such as Lu for an upper case letter or Nd
       for a decimal number, the Unicode script names such as Arabic  or  Han,
       and  the	 derived  properties  Any  and L&. A full list is given in the
       pcrepattern documentation. Only the short names for properties are sup‐
       ported.	For example, \p{L} matches a letter. Its Perl synonym, \p{Let‐
       ter}, is not supported.	Furthermore,  in  Perl,	 many  properties  may
       optionally  be  prefixed by "Is", for compatibility with Perl 5.6. PCRE
       does not support this.

       The following comments apply when PCRE is running in UTF-8 mode:

       1. When you set the PCRE_UTF8 flag, the strings passed as patterns  and
       subjects	 are  checked for validity on entry to the relevant functions.
       If an invalid UTF-8 string is passed, an error return is given. In some
       situations,  you	 may  already  know  that  your strings are valid, and
       therefore want to skip these checks in order to improve performance. If
       you  set	 the  PCRE_NO_UTF8_CHECK  flag at compile time or at run time,
       PCRE assumes that the pattern or subject	 it  is	 given	(respectively)
       contains	 only valid UTF-8 codes. In this case, it does not diagnose an
       invalid UTF-8 string. If you pass an invalid UTF-8 string to PCRE  when
       PCRE_NO_UTF8_CHECK  is set, the results are undefined. Your program may
       crash.

       2. An unbraced hexadecimal escape sequence (such	 as  \xb3)  matches  a
       two-byte UTF-8 character if the value is greater than 127.

       3.  Repeat quantifiers apply to complete UTF-8 characters, not to indi‐
       vidual bytes, for example: \x{100}{3}.

       4. The dot metacharacter matches one UTF-8 character instead of a  sin‐
       gle byte.

       5.  The	escape sequence \C can be used to match a single byte in UTF-8
       mode, but its use can lead to some strange effects.  This  facility  is
       not available in the alternative matching function, pcre_dfa_exec().

       6.  The	character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly
       test characters of any code value, but the characters that PCRE	recog‐
       nizes  as  digits,  spaces,  or	word characters remain the same set as
       before, all with values less than 256. This remains true even when PCRE
       includes	 Unicode  property support, because to do otherwise would slow
       down PCRE in many common cases. If you really want to test for a	 wider
       sense  of,  say,	 "digit",  you must use Unicode property tests such as
       \p{Nd}.

       7. Similarly, characters that match the POSIX named  character  classes
       are all low-valued characters.

       8.  Case-insensitive  matching  applies only to characters whose values
       are less than 128, unless PCRE is built with Unicode property  support.
       Even  when  Unicode  property support is available, PCRE still uses its
       own character tables when checking the case of  low-valued  characters,
       so  as not to degrade performance.  The Unicode property information is
       used only for characters with higher values. Even when Unicode property
       support is available, PCRE supports case-insensitive matching only when
       there is a one-to-one mapping between a letter's	 cases.	 There	are  a
       small  number  of  many-to-one  mappings in Unicode; these are not sup‐
       ported by PCRE.

AUTHOR

       Philip Hazel
       University Computing Service,
       Cambridge CB2 3QG, England.

       Putting an actual email address here seems to have been a spam  magnet,
       so I've taken it away. If you want to email me, use my initial and sur‐
       name, separated by a dot, at the domain ucs.cam.ac.uk.

Last updated: 24 January 2006
Copyright (c) 1997-2006 University of Cambridge.

								       PCRE(3)
[top]

List of man pages available for YellowDog

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome