Sample validation script
There is a sample validation script in the \Validation Scripts directory where Forcepoint DLP is installed. The script contains the basic abilities required for most customers, such as removing NULL or single-character values from being fingerprinted. You can modify it to suit your needs.
The sample package contains the following files:
- default_validation.bat.sample - Sample validation script
- validation_logic.py - Used by the sample validation script.
- default_validation.ini.sample - Sample configuration file
- default_validation.ini.sample - An additional configuration sample file
- dictionary.txt - Sample dictionary file
- in.csv - Sample input file
- out.csv - Sample output file
The first 3 files are also included (with the sample extension, for the batch and ini files) in the Forcepoint DLP installation package.
The sample validation script is a production grade script, which is suitable for many organizations.
Please note that although “default_validation.bat” and “default_validation.ini” files can be renamed according to the conventions mentioned above, do not rename the “validation_logic.py” file. This file must be present in the \ValidationScripts directory (typically C:\Program Files\Websense\Data Security\ValidationScripts) in its original form.
The validation script is predefined to make sure Forcepoint DLP ignores:
- Numbers smaller than 10,000.
- Text strings containing fewer than 4 characters.
- Strings containing only zeros (i.e., “000000”).
- Empty strings.
- Placeholders (NULL and similar values).
- Invalid SSNs in columns named “ssn.”
- Invalid email addresses in columns named “email.”
The following additions and changes can be configured through the “default_validation.ini” configuration file:
- It is possible to create a dictionary file that contains a list of strings for the validation script to remove. The file should be a line delimited UTF-16 file, and its path
name should be written in the IgnoredDictionary configuration option in regular file system format. (For example c:\directory\dictionary.txt.)
Administrators can create UTF-16 files in Windows Notepad by saving the text with “Unicode” encoding.
- An example of this can be found in the “default_validation.ini.sample” file.
- A sample dictionary file—“dictionary.txt”—is also provided.
- Regular expressions can be used to validate any column. To use this feature:
- Add the column name, in lower case, to the columns parameter. Separate column names by semicolons.
- Add a configuration section for the column by appending [column-name] to the file (again, lower case). This is the section header.
- Add a RegExp parameter under the relevant (newly added) section header. Its value is a regular expression.
- The default_validation.ini sample file contains this type of validation for email addresses and social security numbers. These can be used as a
reference.Note: Additional configuration options are available. Contact Forcepoint Technical Support for further assistance.