The aim of this post is to help understand when Precision is the right metric to use, as opposed to False positive rate (FPR).
Two of the most popular metrics to test machine learning models on, say two class classification problems, are the following:
Precision vs. Recall
FPR vs. True positive rate (TPR)
The definitions of the above metrics are as follows:
Recall :
\(\frac{TP}{TP+FN}\)Precision:
\(\frac{TP}{TP+FP}\)TPR = Recall
FPR:
\(\frac{FP}{FP+TN}\)
Please note that from the above definitions, it is clear that Recall and TPR are nothing but the same. The purpose of this article is to highlight situations where Precision is the right metric and where FPR is the right one.
Assume that for a binary class classifier problem with a balanced test set of 100 class 0 samples and 100 class 1 samples, a classifier is trained that achieves an accuracy of 80%. This translates to a TPR of 80% and an FPR of 20%. To achieve these metrics the following would be the counts of the various metrics listed in Table 1:
TP = 80, FN = 20
TN = 80, FP = 20
Hence, for this case the precision would be 80%, the same as the FPR.
Now, assume that the classes are imbalanced in the test set such that there are 100 class 1 samples but 200 class 0 samples. Considering the same TPR and FPR of 80% and 20% respectively, we would have the following metrics:
Accuracy = 80% (note that the accuracy remains static along with TPR/FPR)
TP = 80, FN = 20
TN = 160, FP = 40
In the above case, the precision now degrades to 67%. Further skewing the class imbalance by creating a test set with 100 class 1 samples but 1000 class 0 samples, and maintaining the same TPR and FPR of 80% and 20%, respectively, we now re-compute the standard metrics:
Accuracy = 80%
TP = 80, FN = 20
TN = 800, FP = 200
The precision of this very skewed set degrades further to ~29%.
Hence, in conclusion TPR-FPR is the metric of choice to evaluate machine learning model performance on both balanced and imbalanced test sets. Precision, however, is an important metric to track for applications where the ratio of the number of false positives to every true positive is important, for e.g., in breast cancer detection on mammogram images. Thus, Precision-Recall as a metric can be considered on unbalanced sets, but with a caveat that Precision is likely to be low when the negative class dominates the positive class. When the set is balanced, then Precision-Recall as a metric can safely be considered just like TPR-FPR.