Regular expression content classifiers
Regular expression (regex) patterns can be detected within content, such as the patterns found in U.S. Social Security numbers and credit card numbers.
You can define the patterns to search for using this screen.
When extracted text from a transaction is scanned, the system searches for strings that match the regular expression pattern and may be indicative of confidential information.
To create a regular expression classifier, complete the fields as follows:
Field | Description |
---|---|
Name | Enter a name for this pattern, such as Visa card. |
Description | Enter a description for this pattern, such as Visa credit card patterns. |
Regular expression pattern. |
Enter the regular expression for which you want the system to search, such as all 3-character strings followed by the sequence “123”. The expression should be compatible with Perl syntax. You can use alphanumeric characters and any of the following values:
To include Unicode characters in your pattern, use the format \X{hex- number}. Do not use +, *, or {X,} without an upper limit. Instead use a limited quantifier such as {0,500}/{1,500}/{X,500}/{X}.When using a line break, use the exact syntax shown above. For example: \b[a-zA-Z][347]\d{3}\b will match strings (separated with word boundaries) starting with a letter followed by 3, 4 or 7 and then 3 digits, like “c3122”. |
Test |
Because a regular expression pattern can be quite complex, it is important that you test the pattern before saving it. If improperly written, a pattern can create many false-positive incidents and slow down the system. Create a .txt file (less than 1 MB) that contains values that match this regex pattern. The file must be in plain text UTF8 format. Browse to the file and click Test to test the validity of your pattern syntax. If the pattern you entered is invalid, you’re given an opportunity to fix it. You cannot proceed until the test succeeds. |