mardi 15 août 2017

Yet another contribution to the P-value discussion : Probabilities are maths, not logic

Hi,
for some reason I hit the problem of Null Hypothesis Statistical Testing
also known as the significance testing or P-value, in social and medical sciences.

What is it ?  There is an old discussion that can be traced back to the origin of Statistical testing
in the 1930's, and still alive as more and more scientific publications are based on statistical evidence. According to many qualified observers, the misuse of NHST allows lots of bad science, where the evidence supposed to be from the facts, are actually derived from ill reasoning.

Claim : In the vernacular, the word probability lacks the rigorous definition from a mathematical theory. Back to the definition we understand that the vague concept of a probability is not the same as probabilities functions on measurable sets. Once said, there is no need for another approach, classical probability theory stands.

Here is a link to an article for history, ironically titled "The Earth is round : p<.05".
http://ist-socrates.berkeley.edu/~maccoun/PP279_Cohen1.pdf

and here, a collection of quotes from famous statisticians on the subject :
http://www.indiana.edu/~stigtsts/quotsagn.html

Probabilities can be tricky. The usual Null Hypothesis Testing misuse can be illustrated on a simple example (with the 5% limit value commonly used in research experiments)

- Assuming H0 the patient is Normal, so
- H1 for the patient is Sick,
Having at hand a positive test with a probability of false positive test of only 4%,
--> H0 is rejected and the patient is considered as Sick.

It only seems to be a logical conclusion, for probability in vernacular is a synonym for the degree of truth :  Less than 5 % is translated into an objective evaluation of H0's truth, its adequation to reality, and in that case its degree of truth is too low, and as we know, if not true then false. But we can be wrong when converting probabilities into logical values, as we will see now. 

Probabilities are mappings from a source onto [0,1]. Sources can be different subsets in the population (the Universe), a different source implies a different probability mapping.

Thinking a while, we understand that 96 % is the probability of being tested positive only for the people who are actually Sick, and not a probability of being Sick for the people having a Positive test result. They are simply not the same mathematical functions, we can see this by writing that p(Sick and Negative)+p(Sick and Positive) = 1 and just to dot the i's p(Sick and Normal) = 0.
As we know, this is the reason for using subscripts for a conditional probability, to make it clear that it is not the same probability.

The so-called  "confusion" matrix or table makes it clearer.

Actual            \   Tested
Negative  result
Positive result
Normal
900
100
Sick
4
96

The probability of  being Sick having a Positive test can be easily calculated = 96/(100+96) <50%.
In this mockup, having a positive test result implies almost equal chances for being sick or not.
(Risk is still much higher than for those with negative test : p=4/904)

Unlike the truth, the probability is relative to its measured set.

I think this is just how it was defined in the first but still useful rigorous formalism by Kolmogorov, in the 1930's.

Here is an inspiring article on the same topic for its newest appearance : Tests vs confidence intervals. Econometric Sense: Confidence Intervals: Fad or Fashion: Confidence intervals seem to be the fad among some in pop stats/data science/analytics. Whenever there is mention of p-hacking, or the ills..