How matches are counted

In rules with a database fingerprinting classifier, the number of matches is defined as the number of records in the fingerprinted database that match the analyzed transaction. If a combination of phrases occurs more than once in the analyzed database, it does not account for more than 1 match.

For example, consider the following table:

Column_A Column_B
1234 AAAA
1234 AAAA
5678 AAAA

And a condition specifying the combination of Column_A and Column_B.

  • The text “1234 AAAA” produces a match count of 1. There are 2 records that consist of the match, but it appears only once in the text.
  • The text “1234 AAAA 1234 AAAA” produces a match count of 2. Two records were fingerprinted, and 2 matches appear in the text.
  • The text “AAAA 1234 5678” produces a match count of 2. Two records match, and the parts of text that match both records are not identical (although there’s only 1 match in the text for AAAA). This is because text may state “the following people have AAAA : 1234 and 5678”. Linguistically, this means AAAA applies to several records.
  • The text “1234 AAAA 1234 AAAA 1234 AAAA” produces a match count of 2. Although there are several instances of the match, there are only 2 records (although duplicate) that are leaked.

The fingerprint repository itself generates high match-counts for duplicates. It adds a verification step that removes matches that don’t match the logic above.