RWCRExpr(3C++) RWCRExpr(3C++)
NameRWCRExpr - Rogue Wave library class
Synopsis
#include <rw/re.h>
RWCRExpr re(".*\.doc"); // Matches filename with suffix ".doc"
Description
Class RWCRExpr represents an extended regular expression such as those
found in lex and awk. The constructor "compiles" the expression into a
form that can be used more efficiently. The results can then be used for
string searches using class RWCString. Regular expressions can be of
arbitrary size, limited by memory. The extended regular expression
features found here are a subset of those found in the POSIX.2 standard
(ANSI/IEEE Std 1003.2, ISO/IEC 9945-2). Note: RWCRExpr is available only
if your compiler supports exception handling and the C++ Standard
Library. The regular expression (RE) is constructed as follows: The
following rules determine one-character REs that match a single
character: Any character that is not a special character (to be defined)
matches itself.
A backslash (\fR) followed by any special character matches the
literal character itself; that is, this "escapes" the special
character.
The "special characters" are: + * ? . [ ] ^ $ ( ) { } | \fP
The period (.) matches any character. E.g., ".umpty" matches either
"Humpty" or "Dumpty."
A set of characters enclosed in brackets ([ ]) is a one-character RE
that matches any of the characters in that set. E.g., "[akm]"
matches either an "a", "k", or "m". A range of characters can be
indicated with a dash. E.g., "[a-z]" matches any lower-case letter.
However, if the first character of the set is the caret (^), then
the RE matches any character except those in the set. It does not
match the empty string. Example: [^akm] matches any character
except "a", "k", or "m". The caret loses its special meaning if it
is not the first character of the set. The following rules can be
used to build a multicharacter RE:
Parentheses (( )) group parts of regular expressions together into
subexpressions that can be treated as a single unit. For example,
(ha)+ matches one or more "ha"'s.
A one-character RE followed by an asterisk (*) matches zero or more
occurrences of the RE. Hence, [a-z]* matches zero or more lower-
Page 1
RWCRExpr(3C++) RWCRExpr(3C++)
case characters.
A one-character RE followed by a plus (+) matches one or more
occurrences of the RE. Hence, [a-z]+ matches one or more lower-case
characters.
A question mark (?) is an optional element. The preceeding RE can
occur zero or once in the string -- no more. E.g. xy?z matches
either xyz or xz.
The concatenation of REs is a RE that matches the corresponding
concatenation of strings. E.g., [A-Z][a-z]* matches any capitalized
word.
The OR character ( | ) allows a choice between two regular
expressions. For example, jell(y|ies) matches either "jelly" or
"jellies".
Braces ({ }) are reserved for future use.
All or part of the regular expression can be "anchored" to either
the beginning or end of the string being searched:
If the caret (^) is at the beginning of the (sub)expression, then
the matched string must be at the beginning of the string being
searched.
If the dollar sign ($) is at the end of the (sub)expression, then
the matched string must be at the end of the string being searched.
Persistence
None
Example
#include <rw/re.h>
#include <rw/cstring.h>
#include <rw/rstream.h>
main(){
RWCString aString("Hark! Hark! the lark");
// A regular expression matching any lowercase word or end of a
//word starting with "l":
RWCRExpr re("l[a-z]*");
cout << aString(re) << endl; // Prints "lark"
}
Public Constructors
RWCRExpr(const char* pat);
RWCRExpr(const RWCString& pat);
Construct a regular expression from the pattern given by pat. The status
Page 2
RWCRExpr(3C++) RWCRExpr(3C++)
of the results can be found by using member function status().
RWCRExpr(const RWCRExpr& r);
Copy constructor. Uses value semantics -- self will be a copy of r.
RWCRExpr();
Default constructor. You must assign a pattern to the regular expression
before you use it.
Public Destructor
~RWCRExpr();
Destructor. Releases any allocated memory.
Assignment Operators
RWCRExpr&
operator=(const RWCRExpr& r);
Recompiles self to pattern found in r.
RWCRExpr&
operator=(const char* pat);
RWCRExpr&
operator=(const RWCString& pat);
Recompiles self to the pattern given by pat. The status of the results
can be found by using member function status().
Public Member Functions
size_t
index(const RWCString& str, size_t* len = NULL,
size_t start=0) const;
Returns the index of the first instance in the string str that matches
the regular expression compiled in self, or RW_NPOS if there is no such
match. The search starts at index start. The length of the matching
pattern is returned in the variable pointed to by len. If an invalid
regular expression is used for the search, an exception of type
RWInternalErr will be thrown. Note that this member function is
relatively clumsy to use -- class RWCString offers a better interface to
regular expression searches.
Page 3
RWCRExpr(3C++) RWCRExpr(3C++)
statusType
status() const;
Returns the status of the regular expression:
statusType Meaning
RWCRExpr::OK No errors
RWCRExpr::NOT_SUPPORTED POSIX.2 feature not yet supported.
RWCRExpr::NO_MATCH Tried to find a match but failed
RWCRExpr::BAD_PATTERN Pattern was illegal
RWCRExpr::BAD_COLLATING_ELEMENT Invalid collating element referenced
RWCRExpr::BAD_CHAR_CLASS_TYPE Invalid character class type referenced
RWCRExpr::TRAILING_BACKSLASH Trailing in pattern
RWCRExpr::UNMATCHED_BRACKET [] imbalance
RWCRExpr::UNMATCHED_PARENTHESIS () imbalance
RWCRExpr::UNMATCHED_BRACE {} imbalance
RWCRExpr::BAD_BRACE Content of {} invalid.
RWCRExpr::BAD_CHAR_RANGE Invalid endpoint in [a-z] expression
RWCRExpr::OUT_OF_MEMORY Out of memory
RWCRExpr::BAD_REPEAT ?,* or + not preceded by valid regular expression
Page 4