Menu Home

Fast food, fast publication

The following article is getting quite a lot of press right now: David Just and Brian Wansink (2015), “Fast Food, Soft Drink, and Candy Intake is Unrelated to Body Mass Index for 95% of American Adults”, Obesity Science & Practice, forthcoming (upcoming in a new pay for placement journal). Obviously it is a sensational contrary position (some coverage: here, here, and here).

I thought I would take a peek to learn about the statistical methodology (see here for some commentary). I would say the kindest thing you can say about the paper is: its problems are not statistical.

At this time the authors don’t seem to have supplied their data preparation or analysis scripts and the paper “isn’t published yet” (though they have had time for a press release), so we have to rely on their pre-print. Read on for excerpts from the work itself (with commentary).

Using 2007-2008 Centers for Disease Control’s National Household and Nutrition Examination Survey, the consumption incidence of targeted foods on two non- continuous days was examined across discrete ranges of BMI.

(My understanding is the NHNES is a “day later recall” survey, so at best we are measuring “reported consumption incidence,” not consumption. So even done well the strongest conclusion such a study could support would be something like “people bad at remembering how much they ate.” This reminds one of the title of an earlier book by Wansink “Mindless Eating: Why We Eat More Than We Think.” Frankly this sounds like a dataset unsuitable for establishing anything like the paper’s title.)

Data were analyzed in 2011.

(Okay, not a “fast” publication. So was this also published in 2011? Or was it something that has been claimed for four years and is now being substantiated?)

After excluding the clinically underweight and morbidly obese, consumption of fast food, soft drinks or candy was not positively correlated with measures of BMI.

(Eliminate enough outcome variation and there is no variation to measure/explain.)

We restrict our sample to adults, defined as age 18 or older, who completed two 24-hour dietary recall surveys.

(It plausibly takes more than two days of measurements to get a good image of long term eating habits. Also most “food regulation”, a topic these authors have written on, is targeted at children. So for a useful public policy analysis it would have been nice to leave them in.)

We focus on eating episode rather than amount eaten because it is less subject to recall bias.

(Breaking the actual relation between eating and health, by leaving out amount. Also some effective diets advise more sittings of much smaller portions. Finally haven’t changes in fast-food portions been a huge issue?)

We compare average eating episodes within food and across BMI categories.

(I am guessing this means they are modeling BMI category code instead of the BMI number. There are only about 3 BMI category codes left after “excluding the clinically underweight and morbidly obese.” Again eliminate variation in the measured outcome, and nothing will correlate to it.)

Missing data were omitted from the analysis …

(Just dropping missing data is not likely to work with interview data, unless you truly believe censoring is completely independent of health, diet, and health/diet interactions.)

Likewise, those with normal BMIs consume an average of 1.1 salty snacks over two days, while overweight, obese, and morbidly obese consume an average 0.9, 1.0, and 0.9 salty snacks, respectively.

(Uh, I thought we were “excluding the clinically underweight and morbidly obese.” I guess this is a different analysis. But here is a statistical issue: it really doesn’t look like the independent variable (“salty snacks”) is varying. This is what you would expect if there was no relation or if the relation was masked through additional (omitted) variables. However, it could also be an artifact of data treatment (such as subsetting down to only complete cases) where it would not be evidence of a lack of relation. Since there isn’t a complete methods section we are left to wonder if the analysis is really looking at the claimed underlying data, or just looking at aggregate values (which would not see beneath this not really varying averages, and would be consistent with the mentioned ANOVA methodology).)

UntitledFrom: Table 1. Average Instances of Consumption in 48 Hours of Various Food Items, Sorted by BMI

(I’m not a statistician, but a negative p-value? Maybe that is some variation of z? But the weird values are not just in one column. Is all this just off one ANOVA table? Also, why not try a linear regression on BMI score using non-grouped data, or a logistic regression on BMI category?)

Also when the input (or “independent”) variables are not known to be independent of each other ANOVA is variable order dependent! Usually this is handled by experiment design- but in this case we are observing eating patterns, not assigning them.

Some R code showing the effect is given below. Notice all of the x’s have the same relation to y, but the ANOVA analysis assigns effect in variable order. It does not make any sense to say “x1 is significant, but x10 is not” as the F-scores are not about each variable in isolation.

d <- data.frame(y=rnorm(100))
for(i in 1:10) {
  d[[paste('x',i,sep='')]] <- d$y + rnorm(nrow(d))
## Analysis of Variance Table
## Response: y
##           Df Sum Sq Mean Sq  F value    Pr(>F)    
## x1         1 70.643  70.643 640.0173 < 2.2e-16 ***
## x2         1 22.647  22.647 205.1824 < 2.2e-16 ***
## x3         1  5.285   5.285  47.8821 6.425e-10 ***
## x4         1  6.588   6.588  59.6906 1.491e-11 ***
## x5         1  2.382   2.382  21.5771 1.155e-05 ***
## x6         1  3.027   3.027  27.4269 1.063e-06 ***
## x7         1  0.494   0.494   4.4757   0.03714 *  
## x9         1  1.914   1.914  17.3441 7.137e-05 ***
## x10        1  0.376   0.376   3.4048   0.06830 .  
## Residuals 90  9.934   0.110                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

I am sure I got a few points wrong, but I just don’t see strong result here.

I’ll just end with: it is of course difficult to prove a non-effect, but a single analysis failing to find an effect is not strong evidence against an effect. A single study not finding a relation, doesn’t make two things unrelated. This analysis (seemingly entirely driven off one or two aggregated ANOVA tables, evidently without also trying the simple standard techniques of regression or logistic regression) does not in fact seem sensitive enough to see effects even if there are any.


Some fresh discussion of this at Andrew Gelman’s site as “Fast analysis, soft statistics, and junk data intake is unrelated to research quality for 0% of American scientists.” I’d like to emphasize that I also have nothing personal against these researchers. It is just a bit of “if you are not going to police yourself I may try to do some of it for you, and if I happen to do so I will not be in a good mood or sensitive to fine distinctions.”

Professor Gelman summed it up well:

To not slam people for low-quality work is implicitly to hurt all the serious researchers out there who don’t just publish anything, who don’t hype their work, who are careful maybe to run their statistics by an expert (yes, Cornell has many excellent statisticians) rather than trying to sneak substandard work into print.

Like this:

Categories: Uncategorized

Tagged as:


Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

5 replies

  1. There are so many issues with this paper, that I do not know where one should start. Maybe it’s best not to do so. It’s so easy to pick off one study that (arguably) should have been tripped up at the first hurdle by a careful referee, when many other papers fly free, possibly guilty of the same errors. In recent years as a reviewer I have have found myself feeling guilty of being some kind of curmudgeon for stopping papers that I thought failed some pretty basic tests of logic, statistical analysis, and scientific objectivity. Most of those papers were better than this one. So I’m not sure if I should feel depressed about the state of science in general, or optimistic that I’ve saved some poor author from the odium of the blogosphere. For my sanity, I choose the latter…

    But, really, negative p-values? This is enough to drive anyone to become a Bayesian !

    1. There is definitely the problem that while I am picking one nit I may appear to be endorsing the rest of the paper. Mostly I just looked for things that were both wrong and portable enough to extract. It is too bad they didn’t show this to somebody that could have helped them avoid publication. It is also too bad they went for such a sensational title.

  2. A quick search of the Googles should dissuade anyone from using ANOVA to analyze count data (as would an intro epidemiology course).