regcomp(3)regcomp(3)NAME
regcomp, regerror, regexec, regfree - Compare string to regular expres‐
sion
SYNOPSIS
#include <sys/types.h> #include <regex.h>
int regcomp(
regex_t *preg,
const char *pattern,
int cflags ); size_t regerror(
int errcode,
const regex_t *preg,
char *errbuf,
size_t errbuf_size ); int regexec(
const regex_t *preg,
const char *string,
size_t nmatch,
regmatch_t *pmatch,
int eflags ); void regfree(
regex_t *preg );
LIBRARY
Standard C Library (libc)
STANDARDS
Interfaces documented on this reference page conform to industry stan‐
dards as follows:
regcomp(), regexec(), regerror(), regfree(): POSIX.2, XPG4, XPG4-UNIX
Refer to the standards(5) reference page for more information about
industry standards and associated tags.
PARAMETERS
Specifies the options for regcomp(). The cflags parameter is the bit‐
wise inclusive OR of zero or more of the following options, which are
defined in the /usr/include/regex.h file. Uses extended regular
expressions. Ignores case in match. Reports only success or failure
in regexec(); does not report subexpressions. Treats newline as a spe‐
cial character marking the end and beginning of lines. Contains the
basic or extended regular expression to be compiled by regcomp(). The
structure that contains the compiled basic or extended regular expres‐
sion. Identifies the error code. Points to the buffer where
regerror() stores the message text. Specifies the size of the errbuf
buffer. Contains the data to be matched. Contains the number of sub‐
expressions to match. Contains the array of offsets into the string
parameter that match the corresponding subexpression in the preg param‐
eter. Specifies the options controlling the customizable behavior of
the regexec function. The eflags parameter modifies the interpretation
of the contents of the string parameter. The value for this parameter
is formed by bitwise inclusive ORing zero or more of the following
options, which are defined in the /usr/include/regex.h file. The first
character of the string pointed to by the string parameter is not the
beginning of the line. Therefore, the circumflex character ^ (circum‐
flex), when taken as a special character, does not match the beginning
of the string parameter. The last character of the string pointed to
by the string parameter is not the end of the line. Therefore, the $
(dollar sign), when taken as a special character, does not match the
end of the string parameter.
DESCRIPTION
The regcomp(), regerror(), regexec(), and regfree() functions perform
regular expression matching. The regcomp() function compiles a regular
expression and the regexec() function compares the compiled regular
expression to a string. The regerror() function returns text associated
with an error condition encountered by regcomp() or regexec(). The
regfree() function frees the internal storage allocated for the com‐
piled regular expression.
The regcomp() function compiles the basic or extended regular expres‐
sion specified by the pattern parameter and places the output in the
preg structure. The default regular expression type for the pattern
parameter is a basic regular expression. An application can specify
extended regular expressions with the REG_EXTENDED option.
If the REG_NOSUB option is not set in coptions, the regcomp() function
sets the number of parenthetic subexpressions (delimited by \( and \)
in basic regular expressions or by () in extended regular expressions)
to the number found in pattern.
The regexec() function compares the null-terminated string in the
string parameter against the compiled basic or extended regular expres‐
sion in the preg parameter. If a match is found, the regexec() func‐
tion returns a value of 0 (zero). The regexec() function returns
REG_NOMATCH if there is no match. Any other nonzero value returned
indicates an error.
If the value of the nmatch parameter is 0 (zero) or if the REG_NOSUB
option was set on the call to the regcomp() function, the regexec()
function ignores the pmatch parameter. Otherwise, the pmatch parameter
points to an array of at least the number of elements specified by the
nmatch parameter. The regexec() function fills in the elements of the
array pointed to by the pmatch parameter with offsets of the substrings
of the string parameter. The elements of the pmatch array correspond to
the parenthetic subexpressions of the original pattern parameter that
was specified to the regcomp() function. The pmatch[i].rm_so structure
is the byte offset of the beginning of the substring, and the
pmatch[i].rm_eo structure is one greater than the byte offset of the
end of the substring. Subexpression i begins at the ith matched open
parenthesis, counting from 1. The 0 (zero) element of the array corre‐
sponds to the entire pattern. Unused elements of the pmatch parameter,
up to the value pmatch[nmatch-1], are filled with -1. If the number of
subexpressions exceeds the number specified by the nmatch parameter
(the pattern parameter itself counts as a subexpression), only the
first nmatch-1 are recorded.
When matching a basic or extended regular expression, any given paren‐
thetic subexpression of the pattern parameter can participate in the
match of several different substrings of the string parameter; however,
it may not match any substring even though the pattern as a whole did
match. The following rules are used to determine which substrings to
report in the pmatch parameter when matching regular expressions: If a
subexpression in a regular expression participated in the match several
times, the offset of the last matching substring is reported in the
pmatch parameter. If a subexpression did not participate in a match,
the byte offset in the pmatch parameter is a value of -1. If a subex‐
pression is contained in a subexpression, the data in the pmatch param‐
eter refers to the last such subexpression. If a subexpression is con‐
tained in a subexpression and the byte offsets in the pmatch parameter
have a value of -1, the pointers in the pmatch parameter also have a
value of -1. If a subexpression matched a zero-length string, the off‐
sets in the pmatch parameter refer to the byte immediately following
the matching string.
If the REG_NOSUB option was set in the cflags parameter in the call to
the regcomp() function and the nmatch parameter is not equal to 0
(zero) in the call to the regexec function, the content of the pmatch
array is unspecified.
If the REG_NEWLINE option was not set in the cflags parameter when the
regcomp() function was called, a newline character in the pattern or
string parameter is treated as an ordinary character. If the REG_NEW‐
LINE option was set when the regcomp() function was called, the newline
character is treated as an ordinary character, except as follows: A
newline character in the string parameter is not matched by a (dot)
outside of a bracket expression or by any form of a nonmatching list.
A ^ (circumflex) in the pattern parameter, when used to specify expres‐
sion anchoring, matches the zero-length string immediately after a new‐
line character in the string parameter, regardless of the setting of
the REG_NOTBOL option. A $ (dollar sign) in the pattern parameter,
when used to specify expression anchoring, matches the zero-length
string immediately before a newline character in the string parameter,
regardless of the setting of the REG_NOTEOL option.
The regerror() function returns the text associated with the specified
error code. If the regcomp() or regexec() function fails, it returns a
nonzero error code. If this return value is assigned to the errcode
parameter, the regerror() function returns the text of the associated
message.
If the errbuf_size parameter is not 0, regerror() places the generated
string into the buffer size errbuf_size bytes pointed to by errbuf. If
the string (including the terminating null) cannot fit in the buffer,
regerror() truncates the string and null-terminates the result.
If errbuf_size is 0, regerror() ignores the errbuf parameter and
returns the size of the buffer needed to hold the generated string.
The regfree() function frees any memory allocated by the regcomp()
function associated with the preg parameter. An expression defined by
the preg parameter is no longer treated as a compiled basic or extended
regular expression after it is given to the regfree() function.
RETURN VALUES
Upon successful completion, the regcomp() function returns a value of 0
(zero). Otherwise, regcomp() returns an integer value indicating an
error as described below, and the contents of the preg parameter is
undefined. If the regcomp() function detects an illegal basic or
extended regular expression, it returns REG_BADPAT or an error code
that more precisely describes the error.
If the regexec() function finds a match, the function returns a value
of 0 (zero). Otherwise, it returns REG_NOMATCH to indicate no match or
REG_ENOSYS to indicate that the function is not supported.
Upon successful completion, the regerror() function returns the number
of bytes needed to hold the entire generated string. This value may be
greater than the value of the errbuf_size parameter. If regerror fails,
it returns 0 (zero) to indicate that the function is not implemented.
The regfree() function returns no value.
The following constants are defined as error return values: The con‐
tents within the pair \{ and \} are invalid: not a number, number too
large, more than two numbers, or first number larger than second. The
pattern contains an invalid regular expression. The ?, *, or + symbols
are not preceded by a valid regular expression. The use of a pair of
\{ and \} or {} is unbalanced. The use of [] is unbalanced. An
invalid collating element was referenced. An invalid character class
type was referenced. The pattern contains a trailing \ (backslash).
The function is unsupported. The use of a pair of \( and \) or () is
unbalanced or exceeds the allowable range. The range is set in the
_REG_SUBEXP_MAX parameter of regex.h and is usually 49. An endpoint in
the range expression is invalid. Insufficient memory space is avail‐
able. The number in \digit is invalid or in error. The pattern con‐
tains too many parenthetic subexpressions. The regexec() function did
not find a match.
ERRORS
These functions do not set errno to indicate an error.
EXAMPLES
The following example demonstrates how the REG_NOTBOL option can be
used with the regexec() function to find all substrings in a line that
match a pattern supplied by a user. The main() function in the example
accepts two input strings from the user. The match() function in the
example uses regcomp() and regexec() to search for matches.
#include <sys/types.h> #include <regex.h> #include <locale.h> #include
<stdio.h> #include <string.h> #include <nl_types.h> #include "reg_exam‐
ple.h" #define SLENGTH 128
main() {
char patt[SLENGTH], strng[SLENGTH];
char *eol;
nl_catd catd;
(void)setlocale(LC_ALL, );
catd = catopen("reg_example.cat", NL_CAT_LOCALE);
printf(catgets(catd,SET1,INPUT,
"Enter a regular expression:"));
fgets(patt, SLENGTH, stdin);
if ((eol = strchr(patt, '\n')) != NULL)
*eol = '\0'; /* Replace newline with null */
else
return; /* Line entered too long */
printf(catgets(catd,SET1,COMPARE,
"Enter string to compare\nString: "));
fgets(strng, SLENGTH, stdin);
if ((eol = strchr(strng, '\n')) != NULL)
*eol = '\0'; /* Replace newline with null */
else
return; /* Line entered too long */
match(patt, strng);
}
int match(char *pattern, char *string)
{
char message[SLENGTH];
char *start_search;
int error, msize, count;
regex_t preg;
regmatch_t pmatch;
error = regcomp(&preg, pattern,
REG_ICASE | REG_EXTENDED);
if (error) {
msize = regerror(error, &preg, message, SLENGTH);
printf("%s\n", message);
if (msize > SLENGTH)
printf(catgets(catd,SET1,LOST,"Additional text lost\n"));
return;
}
error = regexec(&preg, string, 1, &pmatch, 0);
if (error == REG_NOMATCH) {
printf(catgets(catd,SET1,NO_MATCH,
"No matches in string\n"));
return;
} else if (error != 0) {
msize = regerror(error, &preg, message, SLENGTH);
printf("%s\n", message);
if (msize > SLENGTH)
printf(catgets(catd,SET1,LOST,
"Additional text lost\n"));
return;
};
count = 1;
start_search = string + pmatch.rm_eo;
while (error == 0) {
error =
regexec(&preg, start_search, 1, &pmatch,
REG_NOTBOL);
start_search = start_search + pmatch.rm_eo;
count++;
};
count--;
printf(catgets(catd,SET1,MATCH,
"There are %i matches\n"), count);
regfree(&preg);
catclose(catd);
}
The following example finds out which subexpressions in the regular
expression have matches in the string. This example uses the same
main() program as the preceding example. This example does not specify
REG_EXTENDED in the call to regcomp() and, consequently, uses basic
regular expressions, not extended regular expressions.
#define MAX_MATCH 10 int match(char *pattern, char *string) {
char message[SLENGTH];
char *start_search;
int error, msize, count, matches_tocheck;
regex_t preg;
regmatch_t pmatch[MAX_MATCH];
error = regcomp(&preg, pattern, REG_ICASE);
if (error) {
msize = regerror(error, &preg, message, SLENGTH);
printf("regcomp: %s\n", message);
if (msize > SLENGTH)
printf(catgets(catd,SET1,LOST,
"Additional text lost\n"));
return;
}
if (preg.re_nsub > MAX_MATCH) {
printf(catgets(catd,SET1,SUBEXPR,
"There are %1$i subexpressions, checking %2$i\n"),
preg.re_nsub, MAX_MATCH);
matches_tocheck = MAX_MATCH;
} else {
printf(catgets(catd,SET1,SUB_EXPR_NUM,
"There are %i subexpressions in the regular expression\n"),
preg.re_nsub);
matches_tocheck = preg.re_nsub;
}
error = regexec(&preg, string, MAX_MATCH, &pmatch[0], 0);
if (error == REG_NOMATCH) {
printf(catgets(catd,SET1,NO_MATCH_ENT,
"String did not contain match for entire regular expres‐
sion\n"));
return;
} else if (error != 0) {
msize = regerror(error, &preg, message, SLENGTH);
printf("regexe: %s\n", message);
if (msize > SLENGTH)
printf(catgets(catd,SET1,LOST,
"Additional text lost\n"));
return;
} else
printf(catgets(catd,SET1,MATCH_ENT,
"String contained match for the entire regular expres‐
sion\n"));
for (count = 0; count <= matches_tocheck; count++) {
if (pmatch[count].rm_so != -1) {
printf(catgets(catd,SET1,SUB_EXPR_MATCH
"Subexpression %i matched in string\n"),count);
printf(catgets(catd,SET1,MATCH_WHERE,
"Match starts at %1$i. Byte after match is %2$i\n"),
pmatch[count].rm_so, pmatch[count].rm_eo);
} else
printf(catgets(catd,SET1,NO_MATCH_SUB,
"Subexpression %i had NO match\n"), count);
}
regfree(&preg);
catclose(catd);
}
SEE ALSO
Commands: grep(1)
Standards: standards(5)regcomp(3)