Menu Home

Why you can not to use statistics to dispute magic

It is a subtle point that statistical modeling is different than model based science. However, empirical scientists seem to go out of their way to conflate the two before the public (as statistical modeling is easier to perform and model based science is more highly rewarded). It is often claimed that model based science is being done when in fact statistics is what is being done (for instance some of the unfortunate distractions of flawed reports related to the important question of the magnitude of plausible anthropogenic global warming).

Both model based science and statistics are wonderful fields, but it is important to not receive the results of one when you have paid for the other.

We will pointedly discuss one of the differences.First let us define our terms.

I will take “model based science” to essentially mean Popperian Falsifiability (an alternative to positivism). This is roughly: you construct a statement or model and the model is said to only have empirical content if it is in theory possible to “falsify the model.” That is the model must form predictions that are specific enough to potentially be disproved. If you see a single instance of the model being wrong, you say the model is wrong (or at best incomplete). And you are done. Frankly, for all the philosophical sturm und drang this is closest to what is meant by science.

I will take statistical modeling to roughly mean Fisherian Null Hypothesis rejection. This is only one branch of statistics (in addition to Fisher’s methods we also have frequentist and Bayesian methods, in particular see: Controversies in the foundations of statistics, Bradley Efron, Amer. Math. Mon. 85, 231-246, 1978) but it is closest to what is actually performed in statistical studies.

You can see the two methods sound very similar- they both emphasize rejection of a hypothesis. But this is deceptive. In the case of Popperian falsifiability you are essentially holding on to a hypothesis that you believe, but are very willing to give it up (one wrong prediction and it is out). In the case of Fisherian rejection you don’t believe the null hypothesis, but you are holding back rejection until you collect enough data to get rid of it.

Let us go over that again.

In the falsifiable or model based science regime: a theory or model would be a proscriptive set of guidelines or laws that allows you to build things (like tall skyscrapers). If ever one of your skyscrapers unexpectedly falls, you know your theory is wrong and you revise. Rejection is quick. But essentially you honestly believed the theory while you were using it. You were on its side and to counter this bias you agree to reject the theory on first failure.

In the statistical regime you never believed the null hypothesis. It is a stand-in you are trying to find a lot of evidence against to embarrass out of existence. Because you know you are against the null hypothesis you do two things try and mitigate your bias against the null hypothesis: you operationally presume it is true during reasoning and you don’t reject it until there is a lot of evidence against it.

To sum up in model based science you believe the model and are confident it can’t be toppled easily (so you don’t defend it as it you are confident it will survive) in statistics you doubt the null hypothesis and you give it every chance to survive (because you are sure that it will not survive).

Now that I have stated my premises let us move on the field I intended to criticize: paranormal powers.

To be deliberately rude: if you are investigating something that does not have a proposed mechanism that you are willing to test and reject you are not doing model based science. And by definition the paranormal is outside of current scientific explanation. It was too much to hope that we were doing model based science in this case (the appearance is deliberately that of science instead of statistics, but our science friends won’t help us call this out as they are often profiting from the same confusion). So you are doing statistics (and there is nothing wrong with that). But if you are doing statistics what is your null hypothesis?

  • Null Hypothesis Candidate 1: ESP does not exist.

    This is a plausible hypothesis and sound “nully” (doesn’t claim much). But you would only be able to use this null hypothesis to try to prove the existence of ESP.

    But it is the exact wrong hypotheses to disprove ESP.
    “The null hypothesis can never be proven” (see Null Hypothesis and
    Statsmanship). Fisherian testing is unfortunately a one-sided design; it can only reject null hypothesis (not fully settle questions).

  • Null Hypothesis Candidate 2: ESP does exist.
  • This is the null hypothesis you need to work with to reject ESP.

    But here is the trap. You must operationally work with the hypothesis (even if you don’t like it) during the rejection attempt. Since you are forced to “operationally accept” the null hypothesis for the duration of the study you have absolutely no defense against critiques like:

    This latter review didn’t find any problems in our methodology or writeup itself, but suggested that, since the three of us (Richard Wiseman, Chris French and I) are all skeptical of ESP, we might have unconsciously influenced the results using our own psychic powers.’

    The paranormal is just one big game of Mornington Crescent. So if you failed to claim that there is no such thing as psychic dampening powers before your opponent accuses you of using such powers: you lose. The game is all about timing, not reality. If you don’t like this kind of situation, don’t get into this kind of situation.

    This is why you shouldn’t use statistics to study bullshit. Statistical testing methods are deliberately designed to be weak. Unfortunately they are easy to work around if given enough rope.

None of this would matter if it didn’t also hold for a lot of what is called mainstream science. Everyone wants the adulation of having imp ortant scientific results; but they seem to only to want to pay to commission statistics.

Take big money pharmaceuticals as an example. Non-working drugs can deliver equivocal results forever (as long as you keep weakening the proposed claims after each study) and always being “on the verge” of a significant result can fund an endless number of studies and careers.

It now past time to define what I meant by “magic.” Magic, for this article, is any hypothesis that is not sufficiently specific and bounded. You can design statistical studies to test many things, but only if you can specifically describe the limits of what you are attempting to study prior to the experimental work. There are two main classes of magic hypothesis the powerful and the weak. Powerful magic hypothesis are unfalsifiable because they have no pre-defined limit on what they can bring in to defend theirselves post experiment. Weak magic hypothesis are unfalsifiable for the simple reason they can be revised after any experiment to claim the effect is present but just slightly more subtle than the resolving power of the last experiment.

You must be very clear about when you are doing science and about when you are doing statistics. The unfortunate truth is: it is very difficult to successfully dispute junk science using tools as deliberately delicate as statistical hypothesis testing. Without a sufficiently critical mindset you get deliberately bad statistics, cargo cult science and dishonest math. A good essay on this researchers wanting to claim the benefits of the trappings of mathematics (but not willing to meet the very strict pre-conditions required) is “The Pernicious Influence of Mathematics on Science” Jack Schwartz, 1962 (collected in “Discrete Thoughts: Essays on mathematics, science, and philosophy” Mark Kac, Gian-Carlo Rota, Jacob T. Schwartz, Birkhauser 1992).

Categories: Opinion

Tagged as:


Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.