Dictionary content classifiers

A dictionary is a container for words and expressions pertaining to your business. To create a dictionary classifier, complete the fields as follows:

Field Description
Name Enter a name for this pattern, such as Diseases.
Description Enter a description for this dictionary, such as Disease names.
Dictionary Content

Dictionaries can have up to 100 phrases. To add content to the dictionary, click Add. Complete the fields on the resulting dialog box as follows:

  • Phrase: Enter a word or phrase to include. This phrase, when found in the content, affects whether the content is considered suspicious.
  • Weight - Select a weight, from -999 to 999 (excluding 0). When matched with a threshold, weight defines how many instances of a phrase can be present, in relation to other phrases, before triggering a policy.

    Thresholds are defined on the policy’s Data Security tab.

    By default, if no weight is assigned, each phrase is given a weight of 1.

    For example, if the threshold is 100 and a phrase’s weight is 10, a web post can have 9 instances of that phrase before a policy is triggered, provided no other phrases are matched. If phrase A has a weight of 10 and phrase B has a weight of 5, 5 instances of phrase A and 10 instances of phrase B will trigger the policy.

Click OK and the phrase appears in the content list. You can add phrases one by one, or import them from a CSV file using the import button described below.

Remove phrases by selecting them and clicking Remove.

Import

If you have many phrases to include, create a text file listing the phrases, then click Import and navigate to the text file.

The text file must be of UTF8 format. In the text file:

  • List each phrase on a separate line. The phrase can be up to 256 characters.
  • Optionally, provide one weight per phrase on the same line.
    • Separate the phrase and weight by a comma. Enclose the phrase in quotes (not required if there is no weight). For example, “private information”, 3
    • Valid weights are from -999 to 999, but you cannot assign a weight of 0.
    • If a phrase has no weight, it is assigned the default weight of 1.
  • Each phrase must be distinct. (Repeated values are ignored.)
  • You can include up to 100 unique phrases. If you include more, only the first 100 are added to the list. If there are already phrases in the dictionary, fewer than 100 are imported.
  • White spaces are ignored.
  • Slashes, tabs, hyphens, underscores, and carriage returns are included in the search.
  • Common words are also included.
Sample file, custom_dictionary.txt:

"confidential",5 
"ProjectX",8
"ProjectY",3
The phrases in this dictionary are case- sensitive Select this check box if you want the phrases that you entered to be added to the dictionary with the same case you applied.

Each dictionary classifier is limited to 100 phrases.