Knowing when to use machine learning

Machine learning offers advantages and disadvantages compared with other Forcepoint DLP classification methods. It is important to assess whether machine learning is the best solution for a particular deployment.

Like any other decision systems that handle complicated data, Forcepoint DLP machine learning may generate false positives (unintended matches) and false negatives (undetected matches). The total fraction of false positives and false negatives is sometimes referred to as the accuracy of the system.

Accuracy of machine learning is derived from the properties of the data, and finding the best data sets can sometimes be challenging. Because of this, before considering machine learning, administrators may want to determine if other types of classifiers, such as fingerprinting or pre-defined policies, are sufficient to classify and protect their data.

An example of when machine learning could be most effective is in differentiating between proprietary and non-proprietary data found in source code. It can be hard to fingerprint source code that is under constant development and continually changing, and predefined policies cannot distinguish between proprietary and non-proprietary source code.

Forcepoint DLP provides several predefined content types that address common use cases, including source code (in C, C++, Java, Perl, and F#), patents, software design documents, and documents related to financial investments. To protect content that belongs to these content types, consider using machine learning, and ensure that you select the appropriate predefined content type.

Machine learning can also be used to complement and enhance fingerprinting and predefined policies and other Forcepoint DLP detection and classification methods.