Two related fallacies I see in machine learning practice are the shift and balance fallacies (for an earlier simple fallacy, please see here). They involve thinking logistic regression has a bit simpler structure that it actually does, and also thinking logistic regression is a bit less powerful than it actually […]

Estimated reading time: 7 minutes

Nina Zumel just completed an excellent short sequence of articles on picking optimal utility thresholds to convert a continuous model score for a classification problem into a deployable classification rule. Squeezing the Most Utility from Your Models Estimating Uncertainty of Utility Curves This is very compatible with our advice to […]

Estimated reading time: 1 minute

I want to talk about a misconception on the difference between inference and prediction. For a well run analytically oriented business, there may not be as many reasons to prefer inference over prediction one may have heard. A common refrain is: data scientists are in error in centering so much […]

Estimated reading time: 8 minutes

I am conducting another machine learning / AI bootcamp this week. Starting one of these always makes me want to get more statistical commentaries down, just in case I need one. These classes have to move fast, and also move correctly. In this case I want to write about decomposition […]

Estimated reading time: 5 minutes

Introduction I’d like to talk about the Kolmogorov Axioms of Probability as another example of revisionist history in mathematics (another example here). What is commonly quoted as the Kolmogorov Axioms of Probability is, in my opinion, a less insightful formulation than what is found in the 1956 English translation of […]

Estimated reading time: 24 minutes

What we’ve got here is failure to communicate Suppose I were to say: “any natural number can be written uniquely, up to order, as a, possibly empty, finite product of prime number(s).” This seems possibly correct, and possibly even careful. Though, one may have to look up the terms (such […]

Estimated reading time: 13 minutes

I am finishing up a work-note that has some really neat implications as to why working with AUC is more powerful than one might think. I think I am far enough along to share the consequences here. This started as some, now reappraised, thoughts on the fallacy of thinking knowing […]

Estimated reading time: 3 minutes

I’ve added a worked R example of the non-convexity, with respect to model parameters, of square loss of a sigmoid-derived prediction here. This is finishing an example for our Python note “Why not Square Error for Classification?”. Reading that note will give a usable context and background for this diagram. […]

Estimated reading time: 59 seconds

There’s a common, yet easy to fix, mistake that I often see in machine learning and data science projects and teaching: using classification rules for classification problems. This statement is a bit of word-play which I will need to unroll a bit. However, the concrete advice is that you often […]

Estimated reading time: 6 minutes