Data classification

With Forcepoint DLP, administrators can use several methods to classify data:

  • Use predefined scripts, dictionaries, file-types, and regular expression (regex) patterns to start classifying data right away.
    • Regex patterns are used to identify alphanumeric strings of a certain format, such as 123-45-6789.
    • File properties classifiers identify data by file name, type, or size.
  • Create customized scripts, dictionaries, file-types, regular expression patterns, and key phrases for specific (described) data. As a shortcut, edit an predefine classifier, then save it with a new name.
  • Fingerprint (register) data. The power of fingerprinting is its ability to detect sensitive information despite manipulation, reformatting, or other modification. Fingerprints enable the protection of whole or partial documents, antecedents, and derivative versions of the protected information, as well as snippets of the protected information whether cut and pasted or retyped.

    The system can fingerprint 2 types of data: structured (databases) and unstructured (files and folders).

  • Create machine learning classifiers by providing examples of the type of data that should be protected and should not be protected, so the system can learn and identify sensitive data in traffic. These are called positive and negative training sets because the examples educate the system.
    • Unlike fingerprinting, the files do not need to contain parts of the analyzed files but can look similar or be on a similar topic.
    • The system learns and recognizes complex patterns and relationships and makes decisions on them without exact include/exclude criteria that are specified in fingerprinting classifiers.
    • Machine learning can even protect new, zero-day documents.

For more information on content classification methods, including which is most and least accurate, see Classifying Content section.