In our last article on the algebra of classifier measures we encouraged readers to work through Nina Zumel’s original “Statistics to English Translation” series. This series has become slightly harder to find as we have use the original category designation “statistics to English translation” for additional work.
To make things easier here are links to the original three articles which work through scores, significance, and includes a glossery.
- “I don’t think that means what you think it means;” Statistics to English Translation, Part 1: Accuracy Measures
- Statistics to English Translation, Part 2a: ’Significant’ Doesn’t Always Mean ’Important’
- Statistics to English Translation, Part 2b: Calculating Significance
A lot of what Nina is presenting can be summed up in the diagram below (also by her). If in the diagram the first row is truth (say red disks are infected) which classifier is the better initial screen for infection? Should you prefer the model 1 80% accurate row or the model 2 70% accurate row? This example helps break dependence on “accuracy as the only true measure” and promote discussion of additional measures.
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.