Regular expression content classifiers

Regular expression (regex) patterns can be detected within content, such as the patterns found in U.S. Social Security numbers and credit card numbers.

You can define the patterns to search for using this screen.

When extracted text from a transaction is scanned, the system searches for strings that match the regular expression pattern and may be indicative of confidential information.

To create a regular expression classifier, complete the fields as follows:

Field Description
Name Enter a name for this pattern, such as Visa card.
Description Enter a description for this pattern, such as Visa credit card patterns.
Regular expression pattern.

Enter the regular expression for which you want the system to search, such as all 3-character strings followed by the sequence “123”. The expression should be compatible with Perl syntax.

You can use alphanumeric characters and any of the following values:


.                   Any single character
[ ]                 Any one character in the 
                    set
^                   Beginning of line
[^]                 Any one character not in the 
                    set
\s                  White-space character
|                   Or
\r?\n               Line break
\                   Escape special character
?                   Previous expression exists 
                    or not
{ }                 Range or frequency
()                  The expression in the 
                    parenthesis is treated as 
                    one term
\b                  Word boundary
\x{hex-number}      Unicode character
\d                  Digit character (0-9)
\D                  Non-digit character
\w                  Alphanumeric character
\W                  Non-alphanumeric character

To include Unicode characters in your pattern, use the format \X{hex- number}.

Do not use +, *, or {X,} without an upper limit. Instead use a limited quantifier such as {0,500}/{1,500}/{X,500}/{X}.When using a line break, use the exact syntax shown above.

For example: \b[a-zA-Z][347]\d{3}\b will match strings (separated with word boundaries) starting with a letter followed by 3, 4 or 7 and then 3 digits, like “c3122”.

Test

Because a regular expression pattern can be quite complex, it is important that you test the pattern before saving it. If improperly written, a pattern can create many false-positive incidents and slow down the system.

Create a .txt file (less than 1 MB) that contains values that match this regex pattern. The file must be in plain text UTF8 format.

Browse to the file and click Test to test the validity of your pattern syntax. If the pattern you entered is invalid, you’re given an opportunity to fix it. You cannot proceed until the test succeeds.