Nina Zumel just completed an excellent short sequence of articles on picking optimal utility thresholds to convert a continuous model score for a classification problem into a deployable classification rule.
This is very compatible with our advice to prefer continuous scoring models, and probability models in particular, to “hard” classification rules during model development for classification problems. In fact her point is the benefit: by delaying the converstion of a contiuous score to a classification rule you may be able to use better more business oriented criteria to make the conversion. And you may be able to change the rule when utilities ore prevelances change.
For completeness we are sharing an example of how you reproduce her calculation on the ROC plot itself. This is a bit of a “more standard way of doing things”, but I think in fact didactically inferior to her method of working more directly in terms of utilities. So to see how to work with the ROC plot, please check out our new note here.
Figure from: “ROC optimization”
Categories: Expository Writing Opinion Pragmatic Data Science Tutorials
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.