vtreat is an essential data preparation system for predictive modeling that helps defend your predictive modeling work against real world data issues including:
- High cardinality categorical variables
- Rare levels (including new or novel levels during application) in categorical variables
- Missing data (random or systematic)
- Irrelevant variables/columns
- Nested model bias, and other over-fit issues.
vtreat also includes excellent, citable, documentation: vtreat: a data.frame Processor for Predictive Modeling.
For this release I want to thank everybody who generously donated their time to submit an issue or build a git pull-request. In particular:
- Vadim Khotilovich, who found and fixed a major performance problem in the y-stratified sampling.
- Lawrence Wu, who has been donating documentation fixes.
- Peter Hurford, who has been donating documentation fixes.
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.