Creating file fingerprinting data pattern

The File Fingerprinting type allows you to create a fingerprint based on a doc or a number of docs to perform a percentage-based match.

This can be done if admins are looking for a form template or standardized doc within their cloud application but are not sure of the exact contents filled out in the doc.

File Fingerprinting setup is a bit different than many of the other pattern types that you can create.

Steps

  1. Navigate to Protect > Objects > DLP Objects.
  2. Click the green plus icon and select File Fingerprinting to create the data pattern.




  3. Enter a Name and Description of the data pattern. Then, click Match Criteria tab.


  4. Select the percentage that the file fingerprinting should match and click Apply in order to see the Download File Fingerprinter link.


  5. Click Download File Fingerprinter link to download the script which you will need to run on your file(s).


  6. The download will contain scripts you will need to run on your file(s) in order to generate a fingerprint signature that you will upload back to this data pattern and set the confidence score. There are two scripts contained in the folder for Linux and Windows.
    The fingerprinter supports Java 8 or 11 on Linux and Java 8 on Windows.
    • The process for running the script is to first create a folder somewhere on your machine and place the file or files you wish to fingerprint. The script will point to the entire folder and create a single fingerprint signature for all of the files in the folder. This means you can have a single data pattern checking for a confidence score of multiple files at once. If you wish for the files to have different percentages, simply create a new data pattern and separate the files into different folders so you can create unique signatures based on whatever confidence score you wish to apply.
    • The confidence score is a percentage based match in multiples of 10 and will match as long as the file in question matches your percentage or greater. For example, setting the confidence score to 70% will match files that match 70% of the content or greater of the fingerprinted document that you created the pattern with.


      Note: Since, we use third-party libraries (Apache Tika) for scanning the files, we are not sure how the size of your files you are using correspond to memory requirements. Therefore, if the java runs out of memory, you should increase the heap size allocated to java to ensure the Forcepoint ONE SSE scripts are able to successfully fingerprint all of the files in your folder.

    Example Script: Run: $ ./run.sh -c fingerprint.ini -s <folder_to_scan> -o <path_to_archive>

    • This will fingerprint all files in <folder_to_scan> recursively and save file fingerprints in <path_to_archive>.
    • There is also a readme file contained in the zip folder to walk you through running the script.
  7. To save the data pattern, click OK.
  8. (Optional) Click on the Test Pattern tab to verify if pattern was configured properly.


    1. Upload a file to see if your pattern was configured properly.
    2. To trigger test on your example, click Test.

      A verdict is also displayed on the bottom of the dialog indicating whether or not the content successfully matches the pattern.