Viewing regular expressions operators

Expressions in simple data patterns can be build using different operators.

Range Operator

Expression Description
[ ] Can contain a range of characters which can be separated by a "-" char for inclusive matching of all chars between two endpoints
[0-9] Will match any digit
[a-z] Will match an alphabetic char
[a-z0-9] Will match an alphanumeric char
^ The "Negate/Not" operator can be used to match against anything except what is defined in the range operator. For example, [^0-9] will match anything but a digit char.

Character Classes

Expression Description
. Will match any char
\b Will match word boundaries at the beginning or end of a sequence of alphanumeric characters
\B Will match non-word boundaries (opposite of \b)
\d Will match a digit char (shorthand for [0-9])
\D Will match non-digit char (shorthand for [^0-9])
\s Will match any whitespace char including spaces, tabs, line feeds, newlines, etc
\S Will match a non-whitespace char
\w Will match any "word" char (shorthand for or [a-zA-Z0-9_])
\W Will match a non-word char
\xHH Will match a hexadecimal character represented by the hex code

Occurrence Operators

Expression Description
* Will match zero or more occurrences of the previous char or expression
+ Will match one or more occurrences of the previous char or expression
? Will match zero or one occurrence of the previous char or expression
{N} Will match exactly N occurrences or the previous char or expression
{N,M} Will match from N up to M occurrences or the previous char or expression
{N,} Will match at least N occurrences of the previous char or expression

Grouping Operator

Expression Description
( ) Can contain a set of characters which must all be present for the match to occur. Parenthesis can contain character classes,
| The "Or" operator can be used to match against different sets of characters enclosed in parenthesis. For example, (dog|cat) will match if dog or cat is present.
Flags
(?Lsu) Flag operator used at the beginning of the grouping operator. Set 'L', 's', 'u' to set the following flags: L - locale dependent, s - dot matches all, u - unicode