Why both false positives and false negatives are bad for the COVID-19 tests. Why Bayes rule is important for these situations.

What is a false positive/negative anyway?

A disease-screening medical test, like the one used to detect whether you are infected with the dreaded COVID-19 virus, essentially gives you a YES/NO answer. But, here are some questions to think about.

- Would you
**trust the answer**unequivocally? - Is the
**probability of a wrong answer**higher for a positive result than a negative result? - What is the
**cost of a mistake**for a wrong answer? Are the costs the same for a ‘YES’ answer vs. a ‘NO’ answer? - Is it better to get
**multiple tests**done to increase the probability of getting a correct diagnosis? Does it make more sense for a ‘YES’ answer vs. a ‘NO’ answer?

**In fact, no test is 100% accurate**. You may have seen on the news that there is a wide variation of accuracy in the tests that are being rapidly developed and deployed for COVID-19. But, it turns out even the term ‘accuracy’ means a very specific thing when it comes to medical tests.

If you think about it, there are four distinct scenarios, for a particular test outcome, with respect to a specific person.

- You may be really infected, and the test says ‘YES’. This is called a
**TRUE POSITIVE (TP)**. - You may not be infected, but still, the test says ‘YES’. This is called a
**FALSE POSITIVE (FP)**. - You may not be infected, and the test says ‘NO’. This is called a
**TRUE NEGATIVE (TN)**. - You may be really infected, but the test says ‘NO’. This is called a
**FALSE NEGATIVE (FN)**.

Now, from a personal point of view, I would be happy with the performance of the test, if it can just detect the ‘right condition’ for me. That means if it has high TP and high TN, it does the job for me, *personally*. It is not *only* about detecting a positive COVID-19 patient with a ‘YES’ verdict, but it is *also* about correctly saying ‘NO’ for a COVID-19 negative patient.

…it turns out even the simple term ‘accuracy’ means a very specific thing when it comes to medical tests.

The exact terminology can vary a little bit, but, in almost all cases, the ‘accuracy’ measure will denote how well the test is doing with respect to the sum of TP and TN as a percentage of the total tests administered.

But a high accuracy is not the only metric by which a test should be judged. Equally important are the other measures like FP and FN numbers.

Why?

Because out of the four situations, described above, only one leads to *non-action with no consequence *i.e. the TN case. In this situation, you, after being tested, will go back home, without taxing the healthcare system *and* any long-term health repercussions.

All of the other three situations have a varying degree of costs (societal, medical, economic, whatever you want to call them) associated with them. And the total cost to the state or nation may well depend on how the test is performing on those metrics.

It is not *only* about detecting a positive COVID-19 patient with a ‘YES’ verdict, but it is *also* about correctly saying ‘NO’ for a COVID-19 negative patient.

## Case of TRUE NEGATIVE (TN)

Let us cover the least expensive one first — the case of TN. As stated above, in this situation, you, after being tested, will go back home, without taxing the healthcare system *and* any long-term health repercussions. The only cost is the emotional toll on you while you wait for the test to be administered and for the result to come out.

## Case of TRUE POSITIVE (TP)

This is a personally dreaded scenario (but not the worst one!). You have been detected as a COVID-19 positive patient and now the ordeal starts. Depending on your exact health situation, and the criticality of the symptoms, you may be advised to self-quarantine or check into a hospital. The costs are of course different in these two alternative situations. One taxes you and your immediate family more, whereas another one taxes the healthcare system significantly.

But, at least, you got a correct assessment! There is a worse outcome, which is the next case.

## Case of FALSE NEGATIVE (FN)

In the case of COVID-19, this is definitely the worst-case situation. A person, with the pathogen in his/her lungs, will go untreated. Depending on the underlying health conditions, and many other physiological parameters, the outcome is not necessarily a fatality, but surely this has higher personal and societal cost than the TP case. If this happens for someone in the high-risk cohort, then a tragic (and possibly avoidable) loss of life can ensue with a high enough possibility.

## Case of FALSE POSITIVE (FP)

This is the most dreaded scenario for the medical system, patient, who, in reality, does not have the virus, is declared positive. The outcome can be of varying nature here. The person may be temporarily admitted into the healthcare system, thereby overloading the system and, more importantly, occupying extremely limited resources, which could have served a truly positive patient. If the person is sent back home, he/she goes through enormous emotional upheaval — for nothing — as he/she is really not infected.

But a high accuracy is not the only metric by which a test should be judged. Equally important are the other measures like FP and FN numbers.

## Statisticians have been doing these for a long time

Essentially, this kind of YES/NO test falls under the so-called binary classification systems. Statisticians have been dealing with these systems for a long time and they call the same metrics by a different set of names — Type-I and Type-II errors.

They even have a fancy name for a tabular representation of all the scenarios we discussed, it is called ‘**Confusion Matrix**’ and it looks like following,

The recent resurgence of machine learning systems and algorithms, many of which use some form of binary (or multi-class) classifiers (e.g. logistic regression, decision tree, support vector machines, and neural networks) at their core, have made this confusion matrix popular. It is one of the most widely used metrics for judging the performance of an ML system.

The great feature of this matrix is that once it is produced, we can calculate a number of useful metrics from just the four numbers,

Characteristics and variations of the specific biomedical test (or of the software algorithm in case of an ML system) will result in different numbers for these metrics. You can simply assign different costs to each of these metrics and tune the test/algorithm to minimize the overall cost.

In the specific case of COVID-19, however, we would not venture into such an exercise. Cost-benefit analyses of such a life-altering, global pandemic should be left to experts and policy-makers at the highest level. As data science practitioners, you will be empowered to know that the same tools, that you use in your ML algorithms or statistical modeling, are utilized for measuring the success of mission-critical medical testing and public health systems.

We will, however, further discuss the utility of these measures for more advanced analysis of the test results using Bayesian probability inference.

# Bayes’ rule for COVID-19 tests?

## A statistical method of seeking a second opinion

Bayes’ theorem (alternatively Bayes’ law or Bayes’ rule) has been called the most powerful rule of probability and statistics. It describes the probability of an event, based on prior knowledge of conditions that might be related to the event.

**Bayes’ rule formula**

It is a powerful law of probability that brings in the concept of ‘subjectivity’ or ‘the degree of belief’ into the cold, hard statistical modeling. It lets us begin with a hypothesis and a certain degree of belief in that hypothesis, based on domain expertise or prior knowledge. Thereafter, we gather data and update our initial beliefs. If the data support the hypothesis then the probability goes up, if it does not match, then probability goes down.

In the domain of medical testing, this continuous update methodology means, we are never satisfied with one set of tests. We can calculate the probability of a person being infected from the test data, repeat the test again, feed the result from the previous test to the same formula again, and update our probability.

This is just like seeing a second (or a third) opinion from a doctor about the diagnosis of a disease.

Cost-benefit analyses of such a life-altering, global pandemic should be left to experts and policy-makers at the highest level. As data science practitioners, you will be empowered to know that the same tools, that you use in your ML algorithms or statistical modeling, are utilized for measuring the success of mission-critical public health systems.

## The knowledge of false positives/negatives are directly applicable

How do you declare a person COVID-19 positive? After you get a positive result from the test.

But, as we discussed, every test result is uncertain to some extent. So, we cannot actually say with 100% certainty that a person is COVID-19 positive, we can only say with high enough probability. Now, if we cast the testing process in terms of probability, here are a few quantities we can write,

**P(COVID-19 positive| test = positive):** This denotes the probability that the person is really COVID-19 positive *given* that the test result is positive. It is called a conditional probability expression. We want to calculate this. Now, if you look at the Bayes’ rule formula above, you will recognize it to be equivalent to the **posterior expression P(A|B).**

**P(test = positive|COVID-19 positive):** This is the **prior P(B|A) **in the Bayes’ rule. This is nothing but **sensitivity** i.e. how many true positives (test results) are there among all the positive cases (in reality).

**P(COVID-19 positive):** This is the probability of a random person having been infected by the COVID-19 virus. In the domain of medical testing, this is called the ‘**prevalence rate**’. This is the piece of the information that is not test-specific but needs domain knowledge or broader statistical measure. For COVID-19, experts may say, after pouring over a lot of data from all over the world that the general prevalence rate is 0.1% i.e. 1 out of 1000 people may be infected with the virus. Of course, this number can change based on the country, health system, active social distancing measure, etc. This term appears in the numerator of the Bayes’ rule ( **P(A)** in the Bayes’ rule).

**P(test=positive)**: This is the denominator in the Bayes’ rule equation i.e. **P(B)**. This can be calculated as,

**P(test=positive) = P(test=positive|COVID-19 positive)*P(COVID-19 positive)+P(test=positive|COVID-19 negative)*P(COVID-19 negative)**

Clearly, this calculation takes into account the fact that we can get a positive test result both for a truly infected person or a FALSE POSITIVE for a non-infected person. The term **P(test=positive|COVID-19 negative)** is simply the FALSE POSITIVE rate calculated from the confusion matrix. The term **P(test=positive|COVID-19 positive)** is the sensitivity as appearing in the numerator (discussed above).

Therefore, we can see that all the characteristics of a medical test can be readily utilized in a Bayesian calculation.

But there is more to the Bayesian statistics than this!

It lets us begin with a hypothesis and a certain degree of belief in that hypothesis, based on domain expertise or prior knowledge. Thereafter, we gather data and update our initial beliefs. If the data support the hypothesis then the probability goes up, if it does not match, then probability goes down.

## Chaining Bayes’ rule

The best thing about Bayesian inference is the **ability to use prior knowledge** in the form of a Prior probability term in the numerator of the Bayes’ theorem.

In this setting of COVID-19 testing, **the prior knowledge is nothing but the computed probability of a test which is then fed back to the next test**.

That means, for these cases, where the prevalence rate in the general population is low, one way to increase confidence in the test result is to prescribe subsequent test, if the first test result is positive, and apply chained Bayes computation.

## A step-by-step example

Look at the following article to understand the same process in the context of a drug screening, which is exactly equivalent to the COVID-19 testing. This article goes through a numerical example and plots and charts to make the calculations clear and shows clearly how the characteristics of a particular test can impact the overall confidence in the test result.Bayes’ rule with a simple and practical exampleWe demonstrate simple yet practical examples of the application of the Bayes’ rule with Python code.towardsdatascience.com

That means, for these cases, where the prevalence rate in the general population is low, one way to increase confidence in the test result is to prescribe subsequent test, if the first test result is positive, and apply chained Bayes computation.

# Summary

The greatest global crisis since World War II and the largest global pandemic since the 1918–19 Spanish Flu is upon us today. Everybody is looking at the daily rise of the death toll and the rapid, exponential spread of this novel strain of the virus.

Data scientists, like so many people from all other walks of life, may also be feeling anxious. It may be somewhat reassuring to know that the familiar tools of data science and statistical modeling are very much relevant for analyzing the critical testing and disease-related data.

The goal of this article was to give an overview of some of the basic concepts in this regard. When you see a discussion about COVID-19 testing and its accuracy, you should be asking these questions and judge the result in light of data-driven rationality.

Medical professionals and epidemiologists work with this kind of analysis all the time. It is time that we also share this knowledge and understanding as much as we can and apply it rightly for discussion or decision-making.

Stay safe, everybody!

http://www.fiverr.com/s2/ef5948cae2

LikeLiked by 1 person