For classification problems I argue one of the biggest steps you can take to improve the quality and utility of your models is to prefer models that return scores or return probabilities instead of classification rules. Doing this also opens a second large opportunity for improvement: working with your domain […]
Estimated reading time: 19 minutes
This note is a little break from our model homotopy series. I have a neat example where one combines two classifiers to get a better classifier using a method I am calling “ROC surgery.” In ROC surgery we look at multiple ROC plots and decide we want to cut out […]
Estimated reading time: 40 seconds
Nina Zumel just completed an excellent short sequence of articles on picking optimal utility thresholds to convert a continuous model score for a classification problem into a deployable classification rule. Squeezing the Most Utility from Your Models Estimating Uncertainty of Utility Curves This is very compatible with our advice to […]
Estimated reading time: 1 minute
I am finishing up a work-note that has some really neat implications as to why working with AUC is more powerful than one might think. I think I am far enough along to share the consequences here. This started as some, now reappraised, thoughts on the fallacy of thinking knowing […]
Estimated reading time: 3 minutes
Recently Microsoft Data Scientist Bob Horton wrote a very nice article on ROC plots. We expand on this a bit and discuss some of the issues in computing “area under the curve” (AUC).
Estimated reading time: 5 minutes
A bit more on the ROC/AUC The issue The receiver operating characteristic curve (or ROC) is one of the standard methods to evaluate a scoring system. Nina Zumel has described its application, but I would like to call out some additional details. In my opinion while the ROC is a […]
Estimated reading time: 16 minutes