Machine learning

Machine learning classifiers are an advanced tool that allows administrators to provide examples of the type of data to protect and not to protect. This allows Forcepoint DLP to learn to identify sensitive data in traffic.

  • The examples of what to protect are called positive training sets.
  • The examples of what not to protect are called negative training sets. Together, these examples educate the system.

Unlike fingerprinting, the files do not need to contain parts of the actual files to protect, but can instead look similar or cover a similar topic. The system learns and recognizes complex patterns and relationships and makes decisions without the exact include/exclude criteria specified in fingerprinting classifiers. Machine learning can even protect new, zero-day documents in this way.

Because machine learning classifiers are not looking for an exact match, they can handle a larger number of files than fingerprinting classifiers.

Note: Machine learning classifiers can be used for unstructured file system data only. They cannot be used for database data or unstructured SharePoint or IBM Domino data.

After creating a classifier, the system assesses the expected number of unintended matches (false positives) and undetected content (false negatives) and provides an accuracy level.

The system supports 3 levels of machine learning classifiers:

  • Explicit negative examples, such as non-proprietary marketing plans as a negative example to propriety marketing plans
  • Non-explicit negative examples, such as directories that do not contain marketing plans as negative examples to directories with proprietary marketing plan
  • Positive examples

For tips and best practices for using machine learning, see Introduction to Machine Learning for Forcepoint DLP on the Forcepoint support site.