How matches are counted
In rules with a database fingerprinting classifier, the number of matches is defined as the number of records in the fingerprinted database that match the analyzed transaction. If a combination of phrases occurs more than once in the analyzed database, it does not account for more than 1 match.
For example, consider the following table:
Column_A | Column_B |
---|---|
1234 | AAAA |
1234 | AAAA |
5678 | AAAA |
And a condition specifying the combination of Column_A and Column_B.
- The text “1234 AAAA” produces a match count of 1. There are 2 records that consist of the match, but it appears only once in the text.
- The text “1234 AAAA 1234 AAAA” produces a match count of 2. Two records were fingerprinted, and 2 matches appear in the text.
- The text “AAAA 1234 5678” produces a match count of 2. Two records match, and the parts of text that match both records are not identical (although there’s only 1 match in the text for AAAA). This is because text may state “the following people have AAAA : 1234 and 5678”. Linguistically, this means AAAA applies to several records.
- The text “1234 AAAA 1234 AAAA 1234 AAAA” produces a match count of 2. Although there are several instances of the match, there are only 2 records (although duplicate) that are leaked.
The fingerprint repository itself generates high match-counts for duplicates. It adds a verification step that removes matches that don’t match the logic above.