Positive examples
For effective machine learning to occur, it is most important to select the best positive examples.
- These are textual examples of the data to protect.
- The documents in this set should be related to the same theme or share other
commonalities.
Without the commonalities, the learning algorithm will not be able to find a way to categorize the data.
The required number of examples depends on the level of commonality. If the positive examples share many common terms that are very rare, in general, a small number suffices. On the other hand, if the differences between the positive and the negative set are more subtle, more examples will be required. A positive set typically consists of 100–200 text documents.