Mikroelektronika MIKROE-738 Datenbogen
110
mikoC PRO for PIC32
MikroElektronika
Examples:
count.*r ß
- matches strings like
'counter'
,
'countelkjdflkj9r'
and
'countr'
count.+r
- matches strings like
'counter'
,
'countelkjdflkj9r'
but not
'countr'
count.?r
- matches strings like
'counter'
,
'countar'
and
'countr'
but not
'countelkj9r'
counte{2}r
- matches string
'counteer'
counte{2,}r
- matches strings like
'counteer'
,
'counteeer'
,
'counteeer'
etc.
counte{2,3}r
- matches strings like
'counteer'
, or
'counteeer'
but not
'counteeeer'
A little explanation about "greediness". "Greedy" takes as many as possible, "non-greedy" takes as few as possible.
For example,
'b+'
and
'b*'
applied to string
'abbbbc'
return
'bbbb'
,
'b+?'
returns
'b'
,
'b*?'
returns empty
string,
'b{2,3}?'
returns
'bb'
,
'b{2,3}'
returns
'bbb'
.
Metacharacters - Alternatives
You can specify a series of alternatives for a pattern using
"|"
to separate them, so that
bit|bat|bot
will match
any of
"bit"
,
"bat"
, or
"bot"
in the target string as would
"b(i|a|o)t)"
. The first alternative includes everything
from the last pattern delimiter
("(", "["
, or the beginning of the pattern) up to the first
"|"
, and the last alternative
contains everything from the last
"|"
to the next pattern delimiter. For this reason, it's common practice to include
alternatives in parentheses, to minimize confusion about where they start and end.
Alternatives are tried from left to right, so the first alternative found for which the entire expression matches, is the
one that is chosen. This means that alternatives are not necessarily greedy. For example: when matching
rou|rout
against
"routine"
, only the
"rou"
part will match, as that is the first alternative tried, and it successfully matches the
target string (this might not seem important, but it is important when you are capturing matched text using parentheses.)
Also remember that "|" is interpreted as a literal within square brackets, so if you write
[bit|bat|bot]
, you're really
only matching
[biao|]
.
Examples:
rou(tine|te)
- matches strings
'routine'
or
'route'
.
Metacharacters - Subexpressions
The bracketing construct
( ... )
may also be used for define regular subexpressions. Subexpressions are numbered
based on the left to right order of their opening parenthesis. The first subexpression has number
‘1’
Examples:
(int){8,10}
matches strings which contain 8, 9 or 10 instances of the
‘int’
routi([0-9]|a+)e
matches
‘routi0e’
,
‘routi1e’
,
‘routine’
,
‘routinne’
,
‘routinnne’
etc.
Metacharacters - Backreferences
Metacharacters
\1
through
\9
are interpreted as backreferences.
\
matches previously matched subexpression
#
.
Examples:
(.)\1+
matches
‘aaaa’
and
‘cc’
.
(.+)\1+
matches
‘abab’
and
‘123123’
([‘”]?)(\d+)\1
matches
“13”
(in double quotes), or
‘4’
(in single quotes) or
77
(without quotes) etc.