In statistical work in the age of big data we often get hung up on differences that are statistically significant (reliable enough to show up again and again in repeated measurements), but clinically insignificant (visible in aggregation, but too small to make any real difference to individuals).

An example would be: a diet that changes individual weight by an ounce on average with a standard deviation of a pound. With a large enough population the diet is statistically significant. It could also be used to shave an ounce off a national average weight. But, for any one individual: this diet is largely pointless.

The concept is teachable, but we have always stumbled of the naming “statistical significance” versus “practical clinical significance.”

I am suggesting trying the word “substantial” (and its antonym “insubstantial”) to describe if changes are physically small or large.

This comes down to having to remind people that “p-values are not effect sizes”. In this article we recommended reporting three statistics: a units-based effect size (such as expected delta pounds), a dimensionless effects size (such as Cohen’s d), and a reliability of experiment size measure (such as a statistical significance, which at best measures only one possible risk: re-sampling risk).

The merit is: if we don’t confound different meanings, we may be less confusing. A downside is: some of these measures are a bit technical to discuss. I’d be interested in hearing opinions and about teaching experiences along these distinctions.

Categories: Opinion Pragmatic Data Science Tutorials

### jmount

Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

To go with “substantial” as a synonym for “practically significant” I propose “statistically discernible” as a replacement for “statistically significant”. (I recently learned that the word “significant” had a different meaning when Fisher created the phrase: https://www.johndcook.com/blog/2008/11/17/origin-of-statistically-significant/ )

Neat! At best significance as it is now means something like “discernible” or “repeatable”.

Effect size is how I know that.

Yes, but what I am trying to avoid is leaning into the phrase “significant effect size” (instead being able to say “substantial effect size” when trying to be qualitative).

I try not to mention statistical significance with my non-statistician colleagues. Instead, I use the terms “confident/not confident.” Using the jargon of “significance” with non-statisticians is asking for misconceptions to spread.

Statistical significance should be a threshold for using the results to make decisions. If they didn’t pass the test, I’ll say no conclusion could be made. Then I’ll discuss with them if another analysis should be done with a larger sample.