Adding a dictionary classifier

Use the Patterns & Phrases > Dictionary Properties page in the Data Security module of the Forcepoint Security Manager to create or edit a dictionary classifier either from scratch.

A dictionary is a container for words and expressions belonging to the same language.

  • Many dictionaries are built into Forcepoint DLP. There are lists for medical conditions, financial terms, and more.
  • Administrators can also create or customize a dictionary list, then use it in policies, either as a classifier or an exception.

Policies can include a combination of classifier types. For example, a policy might include a regex classifier that identifies alphanumerical sequences found in part numbers, as well as a custom dictionary of part names to further identify risk. This helps to reduce false positives.

To access the Dictionary Properties page:

  • To create a dictionary classifier from scratch, select New > Dictionary in the toolbar at the top of the Patterns & Phrases page.
  • Do edit an existing dictionary classifier, select the classifier name in the Patterns & Phrases list.

To define or update the dictionary:

  1. Enter a Name for this pattern, such as Diseases.
  2. Enter a Description for this dictionary, such as Disease terminology.
  3. Under List of phrases to include, use the Phrase field to enter a word or phrase to include, then click Add.

    Do this for each phrase to include until your list is complete. These phrases, when found in the content, affect whether the content is considered suspicious.

  4. For each phrase, select a Weight, from -999 to 999. When matched with a threshold, weight defines how many instances of a phrase can be present, in relation to other phrases, before triggering a policy.

    For example, if the threshold is 100 and a phrase’s weight is 10, an email message, Web post, or other destination can have 9 instances of that phrase before a policy is triggered, provided no other phrases are matched. If phrase A has a weight of 10 and phrase B has a weight of 5, 5 instances of phrase A and 10 instances of phrase B will trigger the policy.

    The system also deducts the weights of excluded terms. Matches that should be excluded and are therefore not considered breaches are not accounted for in the summation of weight.

    By default, if no weight is assigned, each phrase is given a weight of 1.

    Thresholds are defined on the policy’s Condition tab.

  5. To create a dictionary containing many phrases more quickly, create a text file listing the phrases, then click Import and navigate to the text file.

    The text file must be of UTF8 format. In the text file:

    • List each phrase on a separate line. The phrase can be up to 256 characters.
    • Optionally, provide one weight per phrase on the same line. Valid weights are from -999 to 999. If a phrase has no weight, it is assigned the default weight of 1.
    • Separate the phrase and weight by a comma. Enclose the phrase in quotes (not required if there is no weight). For example:

      "confidential",5

      "ProjectX",8

      "ProjectY",3

    • Each phrase must be distinct. (Repeated values are ignored.)
    • You can include up to 5000 unique phrases. If you include more, only the first 5000 will be added to the list.
    • Slashes, tabs, hyphens, underscores, and carriage returns are included in the search.
    • Common words are also included, unlike when fingerprint scans are performed.
  6. Indicate whether or not The phrases in this dictionary are case-sensitive.
  7. If you are editing a predefined dictionary, click Exclude to exclude certain values from the classifier, then:
    • Define the regex Pattern to exclude. Click the “i” icon for a list of valid values.
    • Enter a List of phrases to exclude, separated by commas. Click Add to add them to the list. These phrases, when found in combination with the script, affect whether the content is considered suspicious. Click Remove to remove selected strings from the list.
  8. Click OK.