Saturday, October 14, 2006

Errors in binary classification

"There are two ways to be fooled. One is to believe what isn’t true; the other is to refuse to believe what is true." - Søren Kierkegaard (1813-55)

Any classification procedure may make mistakes. In a binary classification, two kinds of errors may be distinguished. These have different terms in different disciplines but the underlyig concept is the same;
In all cases the problem is to develop a test, procedure or search which will distinguish between cases where a proposition or hypothesis is true or false:






TrueFalse
PositiveTrue Positive (TP)
False Positive -Type 1 Error (FP)
NegativeFalse Negative : Type 2 error (FN)
True Negative (TN)


Examples of such propositions might be:
  • The person has malaria
  • The email is spam
  • The word is spelt correctly
  • The article is relevant
  • The accused is guilty
  • The traveller is a terrorist - FIA watchlist
  • The student is of honours quality
Various measures are used to define the quality of the procedure:

  • Efficiency = (FP + FN) / (TP + FP + TN + FN) = FP + FN/ All
  • Precision = TP/ (TP + FP) = TP / Positives
  • Recall = TP / (TP + FN) = TP / True
These two errors are usually in conflict - we can improve the procedure to decease the risk of False Negatives but that will probably increase the number of False Positives. For example, in screening procedures for terrorists, the aim would be to reduce the number of false negative - (terrorists getting in) but that will create many false positives (innocent travellers picked out as terrorists).

The costs of these two types of errors will be seen differently by different stakeholders in a system, leading to inherent conflict in systems design.

No comments: