Two related fallacies I see in machine learning practice are the shift and balance fallacies (for an earlier simple fallacy, please see here). They involve thinking logistic regression has a bit simpler structure that it actually does, and also thinking logistic regression is a bit less powerful than it actually […]
This note is a little break from our model homotopy series. I have a neat example where one combines two classifiers to get a better classifier using a method I am calling “ROC surgery.” In ROC surgery we look at multiple ROC plots and decide we want to cut out […]
So are model homotopies commonly used? Yes, they are.
Let’s take a stab at our first note on a topic that pre-establishing the definitions of probability model homotopy makes much easier to write. In this note we will discuss tailored probability models. There are models deliberately fit to training data that has an outcome prevalence equal to the expected […]
I am planning a new example-based series of articles using what I am calling probability model homotopy. This is a notation I am introducing to slow down and make clearer discussing how probability models perform on different populations.
Nina Zumel just completed an excellent short sequence of articles on picking optimal utility thresholds to convert a continuous model score for a classification problem into a deployable classification rule. Squeezing the Most Utility from Your Models Estimating Uncertainty of Utility Curves This is very compatible with our advice to […]
Recently, we showed how to use utility estimates to pick good classifier thresholds. In that article, we used model performance on an evaluation set, combined with estimates of rewards and penalties for correct and incorrect classifications, to find a threshold that optimized model utility. In this article, we will show […]
In a previous article we discussed why it’s a good idea to prefer probability models to “hard” classification models, and why you should delay setting “hard” classification rules as long as possible. But decisions have to be made, and eventually you will have to set that threshold. How do you […]
I want to talk about a misconception on the difference between inference and prediction. For a well run analytically oriented business, there may not be as many reasons to prefer inference over prediction one may have heard. A common refrain is: data scientists are in error in centering so much […]
An update on site maintenance. We have moved the Win Vector LLC site to https hosting. Some links were damaged by the transition, but we are fixing them as we find them. Overall the following changes are present: https://win-vector.com/dfiles/ file content has moved to https://github.com/WinVector/Examples/tree/main/dfiles The blog is here: https://win-vector.com/blog-2/. […]