# Sensitivity + Specificity + PPV + TP/FP/TN/FN Formulas

## Formulas for sensitivity, specificity, PPV, NPV, TPR, FPR, prevalence, etc.

A brief cheat sheet / reference guide containing the definitions, formulas, and explanations of the most commonly used model evaluation metrics for binary classification tasks.

# Quickstart

Name | Formula | Definition |
---|---|---|

Sensitivity / Recall / Hit Rate / TPR | $\frac{TP}{TP + FN}$ | $P(\hat{y} = 1 \mid y = 1)$ |

Specificity / Selectivity / TNR / 1 - FPR | $\frac{TN}{TN + FP}$ | $P(\hat{y} = 0 \mid y = 0)$ |

PPV / Precision | $\frac{TP}{TP + FP}$ | $P(y = 1 \mid \hat{y} = 1)$ |

NPV | $\frac{TN}{TN + FN}$ | $P(y = 0 \mid \hat{y} = 0)$ |

FDR | $\frac{FP}{TP + FP}$ | $P(y = 0 \mid \hat{y} = 1)$ |

FPR | $\frac{FP}{FP + TN}$ | $P( \hat{y} = 1 \mid y = 0 )$ |

Note: Prevalence impacts PPV/NPV, but does not impact Sensitivity/Specificity/AUROC.

# Background

We have a binary classification model $M$.

We have a set of inputs $X$, and a set of corresponding labels $Y$.

The goal of our model is to predict the label for a given input.

In other words, given some input $x_i \in X$, the model makes a binary prediction $\hat{y_i}$ of 0 or 1. Unknown to our model, the input $x_i$ has an associated ground truth label $y_i \in Y$.

Ideally, $y_i = \hat{y_i}$

**Example**

Let’s say that we want to predict whether a patient has a disease or not.

We develop a binary classification model for this task.

We say that $y_i = 1$ if patient $i$ has the disease, or $y_i = 0$ if patient $i$ does not have the disease.

We say that $\hat{y_i} = 1$ if the model predicts that patient $i$ has the disease, otherwise $\hat{y_i} = 0$.

**Notation**

- $x_i$ = input example
- $X$ = set of all inputs
- $Y$ = set of all ground truth labels
- $N = \lvert X \rvert = \lvert Y \rvert$ = total number of inputs
- $\hat{y_i} \in {0,1 }$ = the prediction of our model $M$ for a specific $x_i$
- $y_i \in {0,1}$ = the ground truth for a specific $x_i$ (i.e. the value we are trying to predict)
- $\mathbb{I}(boolean) \in {0, 1 }$ = an indicator function. It evaluates to 1 if the boolean expression inside of it is TRUE; otherwise, it evaluates to 0.
- $y = 0$ is referred to as a “negative” true outcome
- $y = 1$ is referred to as a “positive” true outcome
- $\hat{y} = 0$ is referred to as a “negative” prediction
- $\hat{y} = 1$ is referred to as a “positive” prediction

# Counts (TP / TN / FP / FN)

Name | Definition | Is prediction correct? | Interpretation |
---|---|---|---|

True Positives (TP) | Yes | Number of times your model predicted $\hat{y_i} = 1$ when $y_i = 1$ | |

True Negatives (TN) | Yes | Number of times your model predicted $\hat{y_i} = 0$ when $y_i = 0$ | |

False Positives (FP) | No | Number of times your model predicted $\hat{y_i} = 1$ when $y_i = 0$ | |

False Negatives (FN) | No | Number of times your model predicted that $\hat{y_i} = 0$ when $y_i = 1 $ |

# Counts (Ground Truth / Predictions)

Name | Definition | Formula | Interpretation |
---|---|---|---|

Ground Truth Positives ($P$) | Number of examples in your dataset where $y_i = 1$. | ||

Ground Truth Negatives ($N$) | Number of examples in your dataset where $y_i = 0$ | ||

Predicted Positives ($\hat{P}$) | Number of times the model predicted $\hat{y_i} = 1$ | ||

Predicted Negatives ($\hat{N}$) | Number of times the model predicted $\hat{y_i} = 0$ | ||

Prevalence ($p$) | Proportion of examples in your dataset where $y_i = 1$ |

# Sensitivity / Specificity

Name | Definition | Formula #1 | Formula #2 | Formula #3 | Interpretation |
---|---|---|---|---|---|

Sensitivity / Recall / Hit Rate / True Positive Rate (TPR) | $\frac{TP}{TP + FN}$ | $\frac{TP}{P}$ | 5% sensitivity => 5% of positive patients will test positive. |
||

Specificity / Selectivity / True Negative Rate (TNR) | $\frac{TN}{FP + TN}$ | $\frac{TN}{N}$ | 5% specificity => 5% of negative patients will test negative. |
||

False Positive Rate (FPR) | $\frac{FP}{FP + TN}$ | $\frac{FP}{N}$ | 5% FPR => 5% of patients that are negative will test positive |

# Predictive Values

Name | Definition | Formula #1 | Formula #2 | Formula #3 | Interpretation |
---|---|---|---|---|---|

Precision / Positive Predictive Value (PPV) | $\frac{TP}{TP + FP}$ | $\frac{TP}{\hat{P}}$ | 5% PPV => 5% of patients that test positive are actually positive |
||

Negative Predictive Value (NPV) | $\frac{TN}{TN + FN}$ | $\frac{TN}{\hat{N}}$ | 5% NPV => 5% of patients that test negative are actually negative |
||

False Discovery Rate (FDR) | $\frac{FP}{TP + FP}$ | $\frac{FP}{\hat{P}}$ | 5% FDR => 5% of patients that test positive are actually negative |

# Interpretation

**Sensitivity: **Ability to detect disease if a person has it. (Source)

**Specificity:** Ability to exclude people without disease. (Source)

**PPV:** How likely does someone with a positive result have the disease? (Source)

**NPV:** How likely does someone with a negative result not have the disease? (Source)

# When to Use

From Geeky Medics:

Test with **high specificity** => rule **in** an outcome when prediction is **positive**

- If high specificity, then $P(\hat{y} = 0 \mid y = 0) \approx 1$
- Thus, getting a result of $\hat{y} = 1$ probably means that $y = 1$
- Since if $y = 0$ then we would have gotten $\hat{y} = 0$

Test with **high sensitivity** => rule **out** an outcome when prediction is **negative**

- If high sensitivity, then $P(\hat{y} = 1 \mid y = 1) \approx 1$
- Thus, getting a result of $\hat{y} = 0$ probably means that $y = 0$
- Since if $y = 1$ then we would have gotten $\hat{y} = 1$

# Prevalence

- Sensitivity and specificity are
**not**impacted by prevalence- They are metrics of the test, no population

- PPV and NPV
**are impacted**by prevalence- Higher prevalence -> Higher PPV
- At low prevalence -> We expect NPV > PPV