A brief cheat sheet / reference guide containing the definitions, formulas, and explanations of the most commonly used model evaluation metrics for binary classification tasks.

Quickstart

Name	Formula	Definition
Sensitivity / Recall / Hit Rate / TPR	$\frac{TP}{TP + FN}$	$P(\hat{y} = 1 \mid y = 1)$
Specificity / Selectivity / TNR / 1 - FPR	$\frac{TN}{TN + FP}$	$P(\hat{y} = 0 \mid y = 0)$
PPV / Precision	$\frac{TP}{TP + FP}$	$P(y = 1 \mid \hat{y} = 1)$
NPV	$\frac{TN}{TN + FN}$	$P(y = 0 \mid \hat{y} = 0)$
FDR	$\frac{FP}{TP + FP}$	$P(y = 0 \mid \hat{y} = 1)$
FPR	$\frac{FP}{FP + TN}$	$P( \hat{y} = 1 \mid y = 0 )$

Note: Prevalence impacts PPV/NPV, but does not impact Sensitivity/Specificity/AUROC.

Background

We have a binary classification model $M$ .

We have a set of inputs $X$ , and a set of corresponding labels $Y$ .

The goal of our model is to predict the label for a given input.

In other words, given some input $x_i \in X$ , the model makes a binary prediction $\hat{y_i}$ of 0 or 1. Unknown to our model, the input $x_i$ has an associated ground truth label $y_i \in Y$ .

Ideally, $y_i = \hat{y_i}$

Example

Let’s say that we want to predict whether a patient has a disease or not.

We develop a binary classification model for this task.

We say that $y_i = 1$ if patient $i$ has the disease, or $y_i = 0$ if patient $i$ does not have the disease.

We say that $\hat{y_i} = 1$ if the model predicts that patient $i$ has the disease, otherwise $\hat{y_i} = 0$ .

Notation

$x_i$ = input example
$X$ = set of all inputs
$Y$ = set of all ground truth labels
$N = \lvert X \rvert = \lvert Y \rvert$ = total number of inputs
$\hat{y_i} \in {0,1 }$ = the prediction of our model $M$ for a specific $x_i$
$y_i \in {0,1}$ = the ground truth for a specific $x_i$ (i.e. the value we are trying to predict)
$\mathbb{I}(boolean) \in {0, 1 }$ = an indicator function. It evaluates to 1 if the boolean expression inside of it is TRUE; otherwise, it evaluates to 0.
$y = 0$ is referred to as a “negative” true outcome
$y = 1$ is referred to as a “positive” true outcome
$\hat{y} = 0$ is referred to as a “negative” prediction
$\hat{y} = 1$ is referred to as a “positive” prediction

Counts (TP / TN / FP / FN)

Name	Definition	Is prediction correct?	Interpretation
True Positives (TP)	$\sum_{i = 1}^N \mathbb{I}(y_i = \hat{y_i} \land y_i = 1)$	Yes	Number of times your model predicted $\hat{y_i} = 1$ when $y_i = 1$
True Negatives (TN)	$\sum_{i = 1}^N \mathbb{I}(y_i = \hat{y_i} \land y_i = 0)$	Yes	Number of times your model predicted $\hat{y_i} = 0$ when $y_i = 0$
False Positives (FP)	$\sum_{i = 1}^N \mathbb{I}(y_i \ne \hat{y_i} \land y_i = 0)$	No	Number of times your model predicted $\hat{y_i} = 1$ when $y_i = 0$
False Negatives (FN)	$\sum_{i = 1}^N \mathbb{I}(y_i \ne \hat{y_i} \land y_i = 1)$	No	Number of times your model predicted that $\hat{y_i} = 0$ when $y_i = 1$

Counts (Ground Truth / Predictions)

Name	Definition	Formula	Interpretation
Ground Truth Positives ( $P$ )	$\sum_{i = 1}^N \mathbb{I}(y_i = 1)$	$TP + FN$	Number of examples in your dataset where $y_i = 1$ .
Ground Truth Negatives ( $N$ )	$\sum_{i = 1}^N \mathbb{I}(y_i = 0)$	$FP + TN$	Number of examples in your dataset where $y_i = 0$
Predicted Positives ( $\hat{P}$ )	$\sum_{i = 1}^N \mathbb{I}(\hat{y_i} = 1)$	$TP + FP$	Number of times the model predicted $\hat{y_i} = 1$
Predicted Negatives ( $\hat{N}$ )	$\sum_{i = 1}^N \mathbb{I}(\hat{y_i} = 0)$	$TN + FN$	Number of times the model predicted $\hat{y_i} = 0$
Prevalence ( $p$ )	$\frac{1}{N} \sum_{i = 1}^N \mathbb{I}(y_i = 1)$	$\frac{P}{P + N} = \frac{TP + FN}{TP + FN + TN + FP}$	Proportion of examples in your dataset where $y_i = 1$

Sensitivity / Specificity

Name	Definition	Formula #1	Formula #2	Formula #3	Interpretation
Sensitivity / Recall / Hit Rate / True Positive Rate (TPR)	$P(\hat{y} = 1 \mid y = 1)$	$\frac{TP}{TP + FN}$	$\frac{TP}{P}$	$1 - FNR$	5% sensitivity => 5% of positive patients will test positive.
Specificity / Selectivity / True Negative Rate (TNR)	$P(\hat{y} = 0 \mid y = 0)$	$\frac{TN}{FP + TN}$	$\frac{TN}{N}$	$1 - FPR$	5% specificity => 5% of negative patients will test negative.
False Positive Rate (FPR)	$P(\hat{y} = 1 \mid y = 0)$	$\frac{FP}{FP + TN}$	$\frac{FP}{N}$	$1 - TNR$	5% FPR => 5% of patients that are negative will test positive

Predictive Values

Name	Definition	Formula #1	Formula #2	Formula #3	Interpretation
Precision / Positive Predictive Value (PPV)	$P(y = 1 \mid \hat{y} = 1)$	$\frac{TP}{TP + FP}$	$\frac{TP}{\hat{P}}$	$1 - FDR$	5% PPV => 5% of patients that test positive are actually positive
Negative Predictive Value (NPV)	$P(y = 0 \mid \hat{y} = 0)$	$\frac{TN}{TN + FN}$	$\frac{TN}{\hat{N}}$	$1 - FOR$	5% NPV => 5% of patients that test negative are actually negative
False Discovery Rate (FDR)	$P(y = 0 \mid \hat{y} = 1)$	$\frac{FP}{TP + FP}$	$\frac{FP}{\hat{P}}$	$1 - PPV$	5% FDR => 5% of patients that test positive are actually negative

Interpretation

**Sensitivity: **Ability to detect disease if a person has it. (Source)

Specificity: Ability to exclude people without disease. (Source)

PPV: How likely does someone with a positive result have the disease? (Source)

NPV: How likely does someone with a negative result not have the disease? (Source)

When to Use

From Geeky Medics:

Test with high specificity => rule in an outcome when prediction is positive

If high specificity, then $P(\hat{y} = 0 \mid y = 0) \approx 1$
Thus, getting a result of $\hat{y} = 1$ probably means that $y = 1$
Since if $y = 0$ then we would have gotten $\hat{y} = 0$

Test with high sensitivity => rule out an outcome when prediction is negative

If high sensitivity, then $P(\hat{y} = 1 \mid y = 1) \approx 1$
Thus, getting a result of $\hat{y} = 0$ probably means that $y = 0$
Since if $y = 1$ then we would have gotten $\hat{y} = 1$

Prevalence

Sensitivity and specificity are not impacted by prevalence
- They are metrics of the test, no population
PPV and NPV are impacted by prevalence
- Higher prevalence -> Higher PPV
- At low prevalence -> We expect NPV > PPV

Sensitivity + Specificity + PPV + TP/FP/TN/FN Formulas

Formulas for sensitivity, specificity, PPV, NPV, TPR, FPR, prevalence, etc.