Menu Home

How Much Data Do You Need?

Introduction A common question in analytics, statistics, and data science projects is: how much data do you need? This question actually has very specific and clear answers! A first good answer is “it is good to have a lot.” Let’s dig deeper and get some additional more detailed quantitative answers. […]

Thinking About Linear Regression

Introduction I want to spend some time thinking out loud about linear regression. As a data science consultant and teacher I spend a lot of time using linear regression and teaching linear regression. I have found each of these pursuits can degenerate into mere doctrine or instructions. “do this,” “expect […]

“Statistics to English Translation”

The core of our “statistics to English translation” series is Nina Zumel’s sequence of articles: “I don’t think that means what you think it means;” Statistics to English Translation, Part 1: Accuracy Measures Statistics to English Translation, Part 2a: ’Significant’ Doesn’t Always Mean ’Important’ Statistics to English Translation, Part 2b: […]

On The Decomposition of Variance

I am conducting another machine learning / AI bootcamp this week. Starting one of these always makes me want to get more statistical commentaries down, just in case I need one. These classes have to move fast, and also move correctly. In this case I want to write about decomposition […]

Why Do We Plot Predictions on the x-axis?

When studying regression models, One of the first diagnostic plots most students learn is to plot residuals versus the model’s predictions (that is, with the predictions on the x-axis). Here’s a basic example. # build an “ideal” linear process. set.seed(34524) N = 100 x1 = runif(N) x2 = runif(N) noise […]

Talking about clinical significance

In statistical work in the age of big data we often get hung up on differences that are statistically significant (reliable enough to show up again and again in repeated measurements), but clinically insignificant (visible in aggregation, but too small to make any real difference to individuals). An example would […]