We’ve been experimenting with this for a while, and the next R vtreat package will have a back-port of the Python vtreat package sklearn pipe step interface (in addition to the standard R interface).
For quite a while we have been teaching estimating variable re-encodings on the exact same data they are later naively using to train a model on, leads to an undesirable nested model bias. The vtreat package (both the R version and Python version) both incorporate a cross-frame method that allows […]
Video of our PyData Los Angeles 2019 talk Preparing Messy Real World Data for Supervised Machine Learning is now available. In this talk describe how to use vtreat, a package available in R and in Python, to correctly re-code real world data for supervised machine learning tasks. Please check it […]
Slides for PyData LA 2019 vtreat Talk are here!
As we have announced before, we have ported the R version of vtreat to a new Python version of vtreat. Our latest news is: we are speaking about the Python version at PyData LA 2019 (Thursday 10:50 AM–11:35 AM in Track 2 Room).
Regularization is a way of avoiding overfit by restricting the magnitude of model coefficients (or in deep learning, node weights). A simple example of regularization is the use of ridge or lasso regression to fit linear models in the presence of collinear variables or (quasi-)separation. The intuition is that smaller […]
Nina Zumel finished some great new documentation showing how to use Python vtreat to prepare data for multinomial classification mode. And I have finally finished porting the documentation to R vtreat. So we now have good introductions on how to use vtreat to prepare data for the common tasks of: […]
Nina Zumel has been polishing up new vtreat for Python documentation and tutorials. They are coming out so good that I find to be fair to the R community I must start to back-port this new documentation to vtreat for R.
Win Vector LLC‘s Dr. Nina Zumel has just released some new vtreat documentation. vtreat is a an all-in one step data preparation system that helps defend your machine learning algorithms from: Missing values Large cardinality categorical variables Novel levels from categorical variables I hoped she could get the Python vtreat […]
I am excited to announce vtreat is now available for Python on PyPi, in addition for R on CRAN.