Sample validation script

There is a sample validation script in the \Validation Scripts directory where Forcepoint DLP is installed. The script contains the basic abilities required for most customers, such as removing NULL or single-character values from being fingerprinted. You can modify it to suit your needs.

The sample package contains the following files:

  • default_validation.bat.sample - Sample validation script
  • validation_logic.py - Used by the sample validation script.
  • default_validation.ini.sample - Sample configuration file
  • default_validation.ini.sample - An additional configuration sample file
  • dictionary.txt - Sample dictionary file
  • in.csv - Sample input file
  • out.csv - Sample output file

The first 3 files are also included (with the sample extension, for the batch and ini files) in the Forcepoint DLP installation package.

The sample validation script is a production grade script, which is suitable for many organizations.

Please note that although “default_validation.bat” and “default_validation.ini” files can be renamed according to the conventions mentioned above, do not rename the “validation_logic.py” file. This file must be present in the \ValidationScripts directory (typically C:\Program Files\Websense\Data Security\ValidationScripts) in its original form.

The validation script is predefined to make sure Forcepoint DLP ignores:

  • Numbers smaller than 10,000.
  • Text strings containing fewer than 4 characters.
  • Strings containing only zeros (i.e., “000000”).
  • Empty strings.
  • Placeholders (NULL and similar values).
  • Invalid SSNs in columns named “ssn.”
  • Invalid email addresses in columns named “email.”

The following additions and changes can be configured through the “default_validation.ini” configuration file:

  • It is possible to create a dictionary file that contains a list of strings for the validation script to remove. The file should be a line delimited UTF-16 file, and its path name should be written in the IgnoredDictionary configuration option in regular file system format. (For example c:\directory\dictionary.txt.)

    Administrators can create UTF-16 files in Windows Notepad by saving the text with “Unicode” encoding.

    • An example of this can be found in the “default_validation.ini.sample” file.
    • A sample dictionary file—“dictionary.txt”—is also provided.
  • Regular expressions can be used to validate any column. To use this feature:
    • Add the column name, in lower case, to the columns parameter. Separate column names by semicolons.
    • Add a configuration section for the column by appending [column-name] to the file (again, lower case). This is the section header.
    • Add a RegExp parameter under the relevant (newly added) section header. Its value is a regular expression.
    • The default_validation.ini sample file contains this type of validation for email addresses and social security numbers. These can be used as a reference.
      Note: Additional configuration options are available. Contact Forcepoint Technical Support for further assistance.