Positive examples

For effective machine learning to occur, it is most important to select the best positive examples.

These are textual examples of the data to protect.
The documents in this set should be related to the same theme or share other commonalities.
Without the commonalities, the learning algorithm will not be able to find a way to categorize the data.

The required number of examples depends on the level of commonality. If the positive examples share many common terms that are very rare, in general, a small number suffices. On the other hand, if the differences between the positive and the negative set are more subtle, more examples will be required. A positive set typically consists of 100–200 text documents.