Positive examples

For effective machine learning to occur, it is most important to select the best positive examples.
  • These are textual examples of the data to protect.
  • The documents in this set should be related to the same theme or share other commonalities.

    Without the commonalities, the learning algorithm will not be able to find a way to categorize the data.

The required number of examples depends on the level of commonality. If the positive examples share many common terms that are very rare, in general, a small number suffices. On the other hand, if the differences between the positive and the negative set are more subtle, more examples will be required. A positive set typically consists of 100–200 text documents.