A client recently came to us with a question: what’s a good way to monitor data or model output for changes? That is, how can you tell if new data is distributed differently from previous data, or if the distribution of scores returned by a model have changed? This client, […]
Estimated reading time: 17 minutes
I’ve been thinking a bit on statistical tests, their absence, abuse, and limits. I think much of the current “scientific replication crisis” stems from the fallacy that “failing to fail” is the same as success (in addition to the forces of bad luck, limited research budgets, statistical naiveté, sloppiness, pride, […]
Estimated reading time: 11 minutes
One of the trickier tasks in clustering is determining the appropriate number of clusters. Domain-specific knowledge is always best, when you have it, but there are a number of heuristics for getting at the likely number of clusters in your data. We cover a few of them in Chapter 8 […]
Estimated reading time: 10 minutes
XCOM: Enemy Unknown is a turn based video game where the player choses among actions (for example shooting an alien) that are labeled with a declared probability of success. Image copyright Firaxis Games A lot of gamers, after missing a 80% chance of success shot, start asking if the game’s […]
Estimated reading time: 33 minutes
In this installment of our ongoing Statistics to English Translation series1, we will look at the technical meaning of the term ”significant”. As you might expect, what it means in statistics is not exactly what it means in everyday language. As always, a pdf version of this article is available […]
Estimated reading time: 22 minutes