Our publisher, Manning, is running a Memorial Day sale this weekend (May 24-27, 2019), with a new offer every day. Fri: Half off all eBooks Sat: Half off all MEAPs Sun: Half off all pBooks and liveVideos Mon: Half off everything The discount code is: wm052419au. Many great opportunities to […]
Estimated reading time: 30 seconds
Kudos to Professor Andrew Gelman for telling a great joke at his own expense: Stupid-ass statisticians don’t know what a goddam confidence interval is. He brilliantly burlesqued a frustrating common occurrence many people say they “have never seen happen.” One of the pains of writing about data science is there […]
Estimated reading time: 1 minute
Authors: John Mount (more articles) and Nina Zumel (more articles). Our four part article series collected into one piece. Part 1: The problem Part 2: In-training set measures Part 3: Out of sample procedures Part 4: Cross-validation techniques “Essentially, all models are wrong, but some are useful.” George Box Here’s […]
Estimated reading time: 35 minutes
Authors: John Mount (more articles) and Nina Zumel (more articles). In this article we conclude our four part series on basic model testing. When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it’s […]
Estimated reading time: 7 minutes
Authors: John Mount (more articles) and Nina Zumel (more articles). When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it’s better than the models that you rejected? In this Part 3 of our […]
Estimated reading time: 7 minutes
Authors: John Mount (more articles) and Nina Zumel (more articles). When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it’s better than the models that you rejected? In this Part 2 of our […]
Estimated reading time: 9 minutes
Authors: John Mount (more articles) and Nina Zumel (more articles). “Essentially, all models are wrong, but some are useful.” George Box Here’s a caricature of a data science project: your company or client needs information (usually to make a decision). Your job is to build a model to predict that […]
Estimated reading time: 14 minutes
I’ll admit it: I have been wrong about statistics. However, that isn’t what this article is about. This article is less about some of the statistical mistakes I have made, as a mere working data scientist, and more of a rant about the hectoring tone of corrections from some statisticians […]
Estimated reading time: 24 minutes
In Gelman and Nolan’s paper “You Can Load a Die, But You Can’t Bias a Coin” The American Statistician, November 2002, Vol. 56, No. 4 it is argued you can’t easily produce a coin that is biased when flipped (and caught). A number of variations that can be easily biased […]
Estimated reading time: 9 minutes