Recently, we showed how to use utility estimates to pick good classifier thresholds. In that article, we used model performance on an evaluation set, combined with estimates of rewards and penalties for correct and incorrect classifications, to find a threshold that optimized model utility. In this article, we will show […]
In a previous article we discussed why it’s a good idea to prefer probability models to “hard” classification models, and why you should delay setting “hard” classification rules as long as possible. But decisions have to be made, and eventually you will have to set that threshold. How do you […]
There’s a common, yet easy to fix, mistake that I often see in machine learning and data science projects and teaching: using classification rules for classification problems. This statement is a bit of word-play which I will need to unroll a bit. However, the concrete advice is that you often […]
I would like to share a video where we show how to use the vtreat data transformer in the KNIME data science platform. (and we see there is an R/vtreat KNIME example here!)
Chapter 8 “Advanced Data Preparation” of Practical Data Science with R is a study in: Using the R vtreat package for advanced data preparation. Cross-validated data preparation. It is the professionally edited, ready to cite version of an important data preparation methodology. An advantage being: a number of well documented […]
We have some really nifty upcoming enhancements to wrapr unpack/to.
One reason we are developing the wrapr to/unpack methods is the following: we wanted to spruce up the R vtreat interface a bit.
I would like to introduce an exciting feature in the upcoming 1.9.6 version of the wrapr R package: value unpacking.
We’ve been experimenting with this for a while, and the next R vtreat package will have a back-port of the Python vtreat package sklearn pipe step interface (in addition to the standard R interface).
Video of our PyData Los Angeles 2019 talk Preparing Messy Real World Data for Supervised Machine Learning is now available. In this talk describe how to use vtreat, a package available in R and in Python, to correctly re-code real world data for supervised machine learning tasks. Please check it […]