pcre2compat man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

PCRE2COMPAT(3)							PCRE2COMPAT(3)

NAME
       PCRE2 - Perl-compatible regular expressions (revised API)

DIFFERENCES BETWEEN PCRE2 AND PERL

       This document describes the differences in the ways that PCRE2 and Perl
       handle regular expressions. The differences  described  here  are  with
       respect to Perl versions 5.10 and above.

       1.  PCRE2  has only a subset of Perl's Unicode support. Details of what
       it does have are given in the pcre2unicode page.

       2. PCRE2 allows repeat quantifiers only	on  parenthesized  assertions,
       but  they  do not mean what you might think. For example, (?!a){3} does
       not assert that the next three characters are not "a". It just  asserts
       that  the  next	character  is not "a" three times (in principle: PCRE2
       optimizes this to run the assertion  just  once).  Perl	allows	repeat
       quantifiers  on	other  assertions such as \b, but these do not seem to
       have any use.

       3. Capturing subpatterns that occur inside  negative  lookahead	asser‐
       tions  are  counted,  but their entries in the offsets vector are never
       set. Perl sometimes (but not always) sets its numerical variables  from
       inside negative assertions.

       4.  The	following Perl escape sequences are not supported: \l, \u, \L,
       \U, and \N when followed by a character name or Unicode value.  (\N  on
       its own, matching a non-newline character, is supported.) In fact these
       are implemented by Perl's general string-handling and are not  part  of
       its  pattern matching engine. If any of these are encountered by PCRE2,
       an error is generated by default. However, if the PCRE2_ALT_BSUX option
       is set, \U and \u are interpreted as ECMAScript interprets them.

       5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2
       is built with Unicode support. The properties that can be  tested  with
       \p and \P are limited to the general category properties such as Lu and
       Nd, script names such as Greek or Han, and the derived  properties  Any
       and L&. PCRE2 does support the Cs (surrogate) property, which Perl does
       not; the Perl documentation says "Because Perl hides the need  for  the
       user  to	 understand the internal representation of Unicode characters,
       there is no need to implement the  somewhat  messy  concept  of	surro‐
       gates."

       6.  PCRE2 does support the \Q...\E escape for quoting substrings. Char‐
       acters in between are treated as literals. This is  slightly  different
       from  Perl  in  that  $	and  @ are also handled as literals inside the
       quotes. In Perl, they cause variable interpolation (but of course PCRE2
       does not have variables).  Note the following examples:

	   Pattern	      PCRE2 matches	 Perl matches

	   \Qabc$xyz\E	      abc$xyz		abc followed by the
						  contents of $xyz
	   \Qabc\$xyz\E	      abc\$xyz		abc\$xyz
	   \Qabc\E\$\Qxyz\E   abc$xyz		abc$xyz

       The  \Q...\E  sequence  is recognized both inside and outside character
       classes.

       7.  Fairly  obviously,  PCRE2  does  not	 support  the  (?{code})   and
       (??{code})  constructions. However, there is support for recursive pat‐
       terns. This is not available in Perl 5.8, but it is in Perl 5.10. Also,
       the  PCRE2  "callout"  feature allows an external function to be called
       during  pattern	matching.  See	the  pcre2callout  documentation   for
       details.

       8.  Subroutine  calls  (whether recursive or not) are treated as atomic
       groups.	Atomic recursion is like Python,  but  unlike  Perl.  Captured
       values  that  are  set outside a subroutine call can be referenced from
       inside in PCRE2, but not in Perl. There is a discussion	that  explains
       these  differences  in  more detail in the section on recursion differ‐
       ences from Perl in the pcre2pattern page.

       9. If any of the backtracking control verbs are used  in	 a  subpattern
       that  is	 called	 as  a	subroutine (whether or not recursively), their
       effect is confined to that subpattern; it does not extend to  the  sur‐
       rounding	 pattern.  This is not always the case in Perl. In particular,
       if (*THEN) is present in a group that is called as  a  subroutine,  its
       action is limited to that group, even if the group does not contain any
       | characters. Note that such subpatterns are processed as  anchored  at
       the point where they are tested.

       10.  If a pattern contains more than one backtracking control verb, the
       first one that is backtracked onto acts. For example,  in  the  pattern
       A(*COMMIT)B(*PRUNE)C  a	failure in B triggers (*COMMIT), but a failure
       in C triggers (*PRUNE). Perl's behaviour is more complex; in many cases
       it is the same as PCRE2, but there are examples where it differs.

       11.  Most  backtracking	verbs in assertions have their normal actions.
       They are not confined to the assertion.

       12. There are some differences that are concerned with the settings  of
       captured	 strings  when	part  of  a  pattern is repeated. For example,
       matching "aba" against the  pattern  /^(a(b)?)+$/  in  Perl  leaves  $2
       unset, but in PCRE2 it is set to "b".

       13. PCRE2's handling of duplicate subpattern numbers and duplicate sub‐
       pattern names is not as general as Perl's. This is a consequence of the
       fact  the  PCRE2	 works internally just with numbers, using an external
       table to translate between numbers and names. In particular, a  pattern
       such  as	 (?|(?<a>A)|(?<b)B),  where the two capturing parentheses have
       the same number but different names, is not supported,  and  causes  an
       error  at compile time. If it were allowed, it would not be possible to
       distinguish which parentheses matched, because both names map  to  cap‐
       turing subpattern number 1. To avoid this confusing situation, an error
       is given at compile time.

       14. Perl recognizes comments in some places that PCRE2  does  not,  for
       example,	 between  the  ( and ? at the start of a subpattern. If the /x
       modifier is set, Perl allows white space between ( and ?	 (though  cur‐
       rent  Perls warn that this is deprecated) but PCRE2 never does, even if
       the PCRE2_EXTENDED option is set.

       15. Perl, when in warning mode, gives warnings  for  character  classes
       such  as	 [A-\d] or [a-[:digit:]]. It then treats the hyphens as liter‐
       als. PCRE2 has no warning features, so it gives an error in these cases
       because they are almost certainly user mistakes.

       16.  In	PCRE2, the upper/lower case character properties Lu and Ll are
       not affected when case-independent matching is specified. For  example,
       \p{Lu} always matches an upper case letter. I think Perl has changed in
       this respect; in the release at the time of writing (5.16), \p{Lu}  and
       \p{Ll} match all letters, regardless of case, when case independence is
       specified.

       17. PCRE2 provides some	extensions  to	the  Perl  regular  expression
       facilities.   Perl  5.10	 includes new features that are not in earlier
       versions of Perl, some of which (such as named parentheses)  have  been
       in PCRE2 for some time. This list is with respect to Perl 5.10:

       (a)  Although  lookbehind  assertions  in PCRE2 must match fixed length
       strings, each alternative branch of a lookbehind assertion can match  a
       different  length  of  string.  Perl requires them all to have the same
       length.

       (b) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set,  the
       $ meta-character matches only at the very end of the string.

       (c)  A  backslash  followed  by	a  letter  with	 no special meaning is
       faulted. (Perl can be made to issue a warning.)

       (d) If PCRE2_UNGREEDY is set, the greediness of the repetition  quanti‐
       fiers is inverted, that is, by default they are not greedy, but if fol‐
       lowed by a question mark they are.

       (e) PCRE2_ANCHORED can be used at matching time to force a  pattern  to
       be tried only at the first matching position in the subject string.

       (f)	The	 PCRE2_NOTBOL,	    PCRE2_NOTEOL,      PCRE2_NOTEMPTY,
       PCRE2_NOTEMPTY_ATSTART, and PCRE2_NO_AUTO_CAPTURE options have no  Perl
       equivalents.

       (g)  The	 \R escape sequence can be restricted to match only CR, LF, or
       CRLF by the PCRE2_BSR_ANYCRLF option.

       (h) The callout facility is PCRE2-specific.

       (i) The partial matching facility is PCRE2-specific.

       (j) The alternative matching function (pcre2_dfa_match() matches	 in  a
       different way and is not Perl-compatible.

       (k)  PCRE2 recognizes some special sequences such as (*CR) at the start
       of a pattern that set overall options that cannot be changed within the
       pattern.

AUTHOR

       Philip Hazel
       University Computing Service
       Cambridge, England.

REVISION

       Last updated: 15 March 2015
       Copyright (c) 1997-2015 University of Cambridge.

PCRE2 10.20			 15 March 2015			PCRE2COMPAT(3)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net