iad: Errors in binary classification

"There are two ways to be fooled. One is to believe what isn’t true; the other is to refuse to believe what is true." - Søren Kierkegaard (1813-55)

Any classification procedure may make mistakes. In a binary classification, two kinds of errors may be distinguished. These have different terms in different disciplines but the underlyig concept is the same;

In Statistics : Type 1 /Type 2 errors or alpha and beta errors
In Testing: false positive, false negative
In Information Retrieval : Recall and Precision

In all cases the problem is to develop a test, procedure or search which will distinguish between cases where a proposition or hypothesis is true or false:

	True	False
Positive	True Positive (TP)	False Positive -Type 1 Error (FP)
Negative	False Negative : Type 2 error (FN)	True Negative (TN)

Examples of such propositions might be:

The person has malaria
The email is spam
The word is spelt correctly
The article is relevant
The accused is guilty
The traveller is a terrorist - FIA watchlist
The student is of honours quality

Various measures are used to define the quality of the procedure:

Efficiency = (FP + FN) / (TP + FP + TN + FN) = FP + FN/ All
Precision = TP/ (TP + FP) = TP / Positives
Recall = TP / (TP + FN) = TP / True

These two errors are usually in conflict - we can improve the procedure to decease the risk of False Negatives but that will probably increase the number of False Positives. For example, in screening procedures for terrorists, the aim would be to reduce the number of false negative - (terrorists getting in) but that will create many false positives (innocent travellers picked out as terrorists).

The costs of these two types of errors will be seen differently by different stakeholders in a system, leading to inherent conflict in systems design.

•

iad

Saturday, October 14, 2006

Errors in binary classification

No comments:

Blog Archive

Links

Labels